Omar Gadir, Ph. D., Founder/CEO, Iteru Systems
Forward
Data analytics utilize one or a combination of multiple algorithms: Decision Trees, regression, clustering, Bayesian, Support Vector, enterprise search, etc. Enterprise search by itself is not enough to provide analytics, it has to be part of other technologies. At Iteru, we use enterprise search in addition to other technologies to perform data analytics.
Overview Of Google Search And Enterprise Search
Because people are familiar with search, one of the questions asked by many is what is the difference between enterprise search and Google search. Before answering this question, let us examine the nature of Google search. Google deals with Web search. It indexes mostly homogeneous HTML pages that contain meta tags. This kind of metadata helps describe an item and allows it to be found again by browsing or searching. The strength of Google’s search engine is based, in large part, on an algorithm that tracks how sites are linked to other sites. In a simplified case, the more incoming links a page/site has, the higher the ranking in search results.
Enterprise search primarily focus on documents. Documents contains both textual data and hundreds of important metadata that is not the same as meta tags. For instance, metadata for PDF, MS Word, emails include: date document created, owner, department, date modified, email sender, receiver, subject, email submit time, etc. Google search does not cover document metadata.
Summary of the Differences Between Google Search And Enterprise Search
The main differences between Google search and Enterprise search are as follows:
- Google focuses on pages, enterprise search focuses on documents. Documents contains both textual data and hundreds of important metadata. Google search does not cover document metadata.
- Google search is mainly term and phrase search. Enterprise search includes: proximity search (finding words are a within a specific distance away), range search (identifies field or fields values between a lower and upper bounds), fuzzy search (for instance, term similar in spelling to “roam”, identifies words like “roam”), etc.
- Score is vital for Google. In enterprise search a single or a small number of documents may hold vital information, or can be a big liability. For instance, a single document containing sensitive personal information, intellectual property, design, etc.
- People who want their information discovered on Google have to optimize it for indexing by adding titles, description, keywords, feeding pages and URLs. In the enterprise, the users often don’t do that.
- Google cover limited number of document types (Microsoft Excel, MS Word PowerPoint and Adobe PDF). Enterprise information resides in more than 400 types, for instance: Corel WordPerfect, PPT, etc.
- Google accesses documents that has public access (anonymous access). All enterprises employ authentication and permission-based access to documents. In other words enterprise documents are not accessible to Google search.
- On Bing, Google, and other public search engines, you will get millions of results for every word. Within the enterprise, the vocabulary is much more limited and focuses on an organization’s terminology. For instance, in a company you may get hundreds of documents related to a project called “everest”, and 104 million if you use google.
- Google search does not produce clean data.
Google Search Appliance
The Google Search Appliance is a rack-mounted device providing document indexing functionality. It allows enterprise customers to take advantage of Google’s searching and indexing abilities for web services and intranet resources. Two of the points, mentioned in the section above, are not applicable to the appliance. First, the appliance covers more data types (MS Office files, Lotus Notes, CMS and ERP) however still much less than the more than 400 types of unstructured data types. Second, the search is still term and phrase search and does not include metadata.
Iteru Search
Iteru’s search has the following features that do not exist in Google or enterprise search:
- Ingest and index targeted documents instead of all documents. This takes less computing resources and storage and produces cleaner data for analytics and faster analysis.
- Classification and data cleansing could be done while documents are ingested.
- Elegant schema abstraction that allows hundreds of document and metadata to be an integral part of search.
- Customer can define his own metadata.