Enterprise Search: The Data Discovery Solution for Unstructured Data
A serious, but somewhat hidden obstacle stands between visions of a data-driven business and its realization. In order to execute on a meaningful data analytics plan, it is first necessary to have access to the data itself.
This may seem like a non-issue. Isn’t all the data one needs for analytics in various databases, “data lakes” and the like? Not really.
A great deal of data, indeed some of the most critical information required to make the most of data analytics, is locked away in unstructured forms, such as PDF files and other files. Data discovery provides an answer. This process is all about finding data that’s essential for effective analytics and transforming it into meaningful insights.
What is data discovery?
The term “data discovery” refers to a multi-stage process that starts with accessing as much relevant data as possible. It then proceeds to uncovering data insights and then making sure that they get to stakeholders who can use them.
Data discovery is necessary in today’s enterprise for a variety of reasons. In addition to the tendency for important information to be resting out of reach in unstructured data formats, there is also a people problem. Not everyone in an organization knows how to prepare data for analysis. In fact, it’s likely that the majority of employees lack such skills.
This is not to criticize. Most people don’t know how to gather data and prepare it for analytics. It’s a specialized skill. The results are clear, though.
A lot of data is left untouched. Industry research reveals that between 60% and 73% of content is never analyzed. According to VentureBeat, missing out on insights contained in this data translates into a global opportunity cost of $3.3 trillion.
As Rita Sallam, the Gartner Vice-President, explained, “Data preparation is one of the most difficult and time-consuming challenges facing business users of BI and data discovery tools, as well as advanced analytics platforms.” She is credited with coining the term “data discovery.”
Data discovery tools address this issue. They enable non-technical people to get a hold of complex data sets and extract the information they need. Sallam added, “Data preparation capabilities are emerging that will provide business users and analysts the ability to extend the scope of self-service to include information management, and extract, transform and load (ETL) functions.”
In her view, the tools enable users to prepare data for Business Intelligence (BI) and data analytics tools. Specific actions now possible include data profiling, integration, curation, modelling and enrichment.
Transform data into information
The middle step of data discovery, the uncovering of data insights, deals with an intangible but extremely important aspect of getting data to live up to its potential. Even if one can assemble all the data required for analysis, that alone may not suffice for getting to real insights. That takes the transformation of data into useable information.
In some cases, the transformation is a matter of pattern analysis and correlation. For example, a company might have three data sets that appear unrelated: customer complaints, delivery appointment schedules and employee sick leave records. With an “insight engine” or comparable analytical tools, the company might come to understand that customer complaints rise when a certain percentage of employees are out sick, resulting in late deliveries.
In some cases, the transformation is a matter of pattern analysis and correlation. For example, a company might have three data sets that appear unrelated: customer complaints, delivery appointment schedules and employee sick leave records. With an “insight engine” or comparable analytical tools, the company might come to understand that customer complaints rise when a certain percentage of employees are out sick, resulting in late deliveries.
An insight engine is the driver of data discovery. It might take the form of an enterprise search platform that can gather data from multiple sources, including unstructured data, and enable employees to arrive at insights based on what they can discover.
Without such an intelligent search tool, people could spend a lot of time searching for information and trying to integrate and understand what they find—only to find that it’s incomplete or out of data. Or, they can’t quite find the information that’s relevant to their needs.
Information is the key to success
Sinequa offers a good example of how search data can become the basis for information, a core element of a data discovery process. First, Sinequa brings together virtually any kind of data required for data discovery. The tool uses 200 out-of-the box connectors, as well as built-in converters, to access, enrich and unify content. It can do this for over 350 document formats—making all of it available via a single search index.
Sinequa then applies natural language processing (NLP), which gives the text meaning beyond simple keyword search. By reacting to natural language, Sinequa performs the important data discovery task of placing data analysis within the reach of non-technical users.
Information is key to success. This is the essence of data discovery.
For example, consider how a business strategy can benefit when employees are able to access all historical and current performance data, along with market trends and data streams related to vendors and competitors. They can create a report that predicts where the company is heading. The organization grows more intelligent.
Sinequa is able to take the data discovery process even further with machine learning (ML) algorithms that simplify and speed up data analytics. Further to the goal of empowering non-technical people to engage in data discovery, ML helps the search tool and insight engine teach itself to get better at its workload. Over time, employees gain the ability to do more data discovery more quickly—making greater contributions to the business through insights garnered from meaningful information.
Conclusion
Becoming a data-driven organization is not an elusive goal. It may seem that way, because data analytics seems to be the preserve of hard-to-find and highly skilled people. Or, the data simply doesn’t seem to be there. The data available for analysis appears incomplete.
However, by adding unstructured data, such as what is found in corporate content and PDFs, and putting an enterprise search-based insight engine to work, all of a sudden the data comes to life. It can be transformed into information. With information, people can work with meaningful insights into the business. This is the power of data discovery, as enabled and fortified by enterprise search.