We illustrate 10 points below that define semantic search using our online demo where we compared hakia's enterprise search system with Pubmed's search engine, while QDEXing 20 million documents on Pubmed.
1- Handling morphological variations
A semantic search engine is expected to handle all morphological variations (like tenses, plurals, etc.) on a consistent basis. In other words, the results should not change whether you type "improve, improves, improving, improved, improvement". The example query "improving quality of life" illustrates that hakia results contain morphological variations of the query.
2- Handling synonyms with correct senses
A semantic search engine is expected to handle synonyms (cure, heal, treat,.. etc.) in the right context and with correct word senses. For example, the word "treat" can mean doing social favors as in trick and treat, which would not be correct in the medical sense. The example query "is there a cure for ALS" shows that hakia brings results with synonyms with the correct senses. The level of sense disambiguation in a semantic search engine is a sign of its progress.
3- Handling generalizations
A semantic search engine is expected to handle generalizations (disease = GERD, ALS, AIDS, etc.) where the user's query is expressed in generalized form and the result is expected to be specific. The example query "Which disease has the symptom of coughing?" brings a result set in hakia such that GERD is recognized by the system as the specific answer.
4- Handling concept matching
Perhaps the most challenging functionality among all, a semantic search engine is expected to recognize concepts and bring relevant results. Usually, the depth of this capability is increased in verticals of operation, and it would be unrealistic to expect coverage in all subjects under the sun. The example query "what treats headache" brings a result set in hakia including concept matching such that migraine belongs to the concept of headache in the medical sense.
5- Handling knowledge matching
Very similar to the previous item, a semantic search engine is expected to have embedded knowledge and use it to bring relevant results (swine flu = H1N1, flu=influenza.) Knowledge match and concept match are similar in principle, yet different in practice in the way the capability is acquired. The example query "swine flu virus" brings a result set in hakia where these kinds of matches are visible.
6- Handling natural language queries and questions
A semantic search engine is expected to respond sensibly when the query is in a question form (what, where, how, why, etc.) Note that a "search engine" is different than a "question answering" system. Search engine's main task is to rank search results in the most logical and relevant manner whereas a question answering system may produce a single extracted entity. The example query "how fast is swine flu spreading?" brings a result set in hakia to shed light to this capability.
7- Ability to point to uninterrupted paragraph and the most relevant sentence
Unlike conventional search engines where a query points to documents, semantic search is expected to do much better. A query must point not only to documents but also to relevant sections of them. This eliminates 2nd search where the user is supposed to open the documents to find the relevant sections. The previous example query "how fast is swine flu spreading?" shows this capability as displayed below.
8- Ability to Customize and Organic Progress
Every search application tied to a specific business objective will have some specifics not applicable to general search mentality. The conventional approach of "one size fits all" limits the performance due to lack of options for progress. Semantic search allows customization in various stages by the owners of the system as well as the user of the system (i.e., such as semantic tagging) where search becomes a part of a social network formed around a business.
9- Ability to operate without relying on statistics, user behavior, and other artificial means
A semantic search engine is expected to bring relevant results by analyzing the content of a page (or document), its source, authors, and the credibility of the results in response to a query. Relying on link referrals, user behavior/tagging, and other artificial means may produce good results when such data is available, but are outside the realm of semantic search. By not relying on artificial input, semantic search technology is more universal, applicable to any situation especially to enterprise documents and real-time content where such data does not exist.
10- Ability to detect its own performance
When there is no semantic content analysis in a search algorithm, relevancy scores refer to artificial measurements, like how popular the page is. A semantic search engine is expected to produce a relevancy score that reflects the degree of meaning match. This capability provides flexibility for the developers to apply meaning thresholds. Accordingly, the search engine can understand its poor performance to automatically flag areas of improvement that is needed.