Search and Pattern Recognition
– Steve Wooledge, Vice President, Product Marketing, MapR Technologies, says:
Interest in the area of search and pattern recognition has exploded recently, due to applications as varied as financial forecasting, data mining, document classification, and biometrics (fingerprints, facial recognition), just to name a few. What’s really driving this growth? It’s the fact that big data tools such as Apache™ Hadoop® have evolved to the point where real-time analysis, search and pattern recognition are combining forces to deliver entirely new intelligence capabilities. Enterprises are realizing that a major component of their big data strategy should include a search technology for unlocking their large data stores.
Notably, the integration of search capabilities into Hadoop was an important milestone for those involved in data mining, and it gives customers the opportunity to find new insights much more quickly. Search has evolved in recent years beyond keyword search into a more broadly-applicable information discovery tool by using principles of reflected intelligence. Hadoop is used to search the entire data “haystack”, and scales up to deal with larger and larger haystacks. For example, the enterprise-grade search capability in the MapR Distribution for Hadoop works directly on any data or files without having to perform any conversion or transformation. These new search capabilities are literally reshaping fields such as finance, e-commerce, social networking, and scientific research.
New search capabilities are an enormous step forward particularly for time-sensitive processes such as fraud detection, where big data must be searched as it streams into the enterprise. Security companies, financial services institutions, and research institutions are just a few examples of the types of organizations that are using the combination of an enterprise-grade Hadoop distribution and tools such as LucidWorks Search, Solr and Elasticsearch to quickly perform searches across a tremendous amount of information. MapR has partnered with Lucid Imagination (the commercial company for Apache Lucene and Apache Solr search technology) to integrate the MapR Distribution with the LucidWorks search platform, giving users a compelling alternative to existing search solutions.
Search technologies can also serve as part of a real-time, full-featured recommendation engine, performing collaborative filtering (“users who liked this song also liked…”), categorical classification and hierarchical-based recommendations, as well as related-concept extraction and concept-based recommendations.
In addition to search and recommendations, Hadoop is an excellent platform for extremely fast pattern recognition. Search technology, in combination with big data analytics, can be used to mine raw data to find useful patterns of behavior. Simply put, pattern recognition can be thought of as an area of machine learning which aims to recognize patterns within sets of data, relying on programmed as well as learned knowledge. Pattern recognition can be particularly useful for helping researchers/BI users to quickly sort data in order to discover meaningful information, eliminating the need for manually reading through data to find patterns. Pattern recognition can also be used to discover patterns buried in the data that people might otherwise miss, such as evidence of insurance fraud due to irregularity patterns found in thousands of insurance claims.
It has been estimated that the costs of insurance fraud approach $80 billion annually, according to Aite Group. One leading insurance company is using an enterprise-grade Hadoop platform to leverage larger volumes and varieties of data for nuanced pattern recognition in order to detect sophisticated fraud patterns. By using Hadoop, the company is able to identify and combat fraudulent claims on a daily basis by leveraging more data sources.
Solutionary, the leading pure-play managed security service provider in North America, is applying big data-driven pattern recognition to threat analysis. Solutionary deployed the MapR Distribution for Hadoop to pull in both structured and unstructured data into a single platform for analysis. By using Hadoop, Solutionary can now analyze and process trillions of messages every year, greatly improving their clients’ security.
What else can companies do with pattern recognition? Here are just a few applications:
• Manufacturing: quality control, process control, object recognition
• Weather: forecasting based on satellite data, earthquake pattern analysis
• Transportation: pedestrian detection, car malfunctions
• High-energy physics: particle physics experiments, event reconstruction
• Astronomy: data-intensive digital sky surveys
• Biological and human behavioral pattern recognition: speech recognition, fingerprint recognition, facial recognition, neurosciences, DNA mapping, fingerprint/iris recognition
• Medicine: X-ray image pattern recognition, ECG signal analysis, identifying micro calcification in mammograms, gait analysis
• Financial: fraud prediction, financial forecasting, customer risk modeling, customer churn analysis, trading behavior anomalies
• Retail: point of sale transaction analysis, demand prediction
• Security/Military: satellite terrain image analysis, security screening, radar signal classification
• Web, Ecommerce: web browsing pattern analysis across different categories
Hadoop is well positioned to help companies extract the most value from their data assets. The new challenge, however, will not be simply uncovering the elusive needle in the data haystack, but finding the hidden patterns between the haystacks themselves. Finding patterns in the connections among points of data will be the key to many exciting new insights and discoveries.
Steve brings over 12 years of experience in product marketing and business development to MapR. As Vice President of Product Marketing, he is in charge of increasing awareness, driving demand, as well as identifying new market opportunities for MapR solutions for big data management and analytics. Steve was previously Vice President of Marketing for Teradata Unified Data Architecture, where he drove big data strategy and market awareness across the product line, including Apache Hadoop. Steve also held various roles in product and corporate marketing at Aster Data – an innovator in big data analytics – prior to being acquired by Teradata. Earlier in his career, Steve held product marketing positions at Interwoven and Business Objects, as well as sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum.
Steve holds an MBA from the Kellogg School of Management at Northwestern University, and a BS in Chemical Engineering from the University of Akron.