Converting Content to a Chat-Bot
The best way to learn or understand information is through asking questions. A QnA format of learning is more effective than reading long form content, especially when time is critical. More over searching for an information and reading multiple documents just to know one thing is inefficient. Instead we can directly ask a question regarding the problem and the system can retrieve the information for us which is more time efficient in this fast moving world.
Question answering system is the Field of information retrieval that automatically answers the question asked by humans. It accepts, analyses and decomposes fixed domain factual questions given in Natural Language. The system is based on Machine Learning Algorithms along with advanced Natural language processing. There are many domains where it can be of significant value like health care remedies etc.. Text cleaning , NLP, Similarity matrix as well as latent semantics analysis are important parts which are required to build the system. The main challenges that are faced in Question Answering are Lexical Gap, Ambiguity and Multilingualism. In Natural language the same question can be expressed in different ways and same phrases may have different meaning as well as Question Answering is expected to recognise a language and get results.
In this article we describe the results from two techniques: 1.. Similarity distance using words and 2. Latent Semantic Analysis (LSA).
Similarity distance
In statistics a similarity measure is a real values function that quantifies the similarity between two objects. Similarity measure is done by taking distance metrics. Simple matching coefficient counts both mutual presences(when an attribute is present in both objects) as matches and compares it to the total number of attributes in both the sets. Jaccard coefficient is a good metric for similarity.
Latent Semantics Analysis:
LSA is a method for extracting and representing the contextual-usage meaning of words by statistical computation applies to a large corpus text. It uses bag of word model, which result in a term-document matrix (occurrence of terms in a document). LSA learns latent topics by performing a matrix decomposition on the document-term matrix using Singular value decomposition . LSA is typically used as a dimension reduction technique.
An Example
Question : How to reduce stress?
We have retrieved the answer to this question form a medical website using the above two techniques. Below are the answers:
1. Similarity Distance Method: The similarity distance method gives the following answer. As you can see the technique searched for exact words in the title.
Patricia Farrell, PhD: Well, we’ve had experiments that have shown things like guided imagery certainly have a great effect on people. We have seen people who have cardiac conditions, people who have major physical illnesses to use these kinds of techniques and it has been very effective to really help them with the stress, bring down their blood pressure, bring down so many things, enable them to get restful sleep which is very important, and don’t forget, stress is directly related to your immune system. Patricia Farrell, PhD (cont.): If you allow the stress to build to a point where it is truly high, it is truly going to be damaging on your immune system. It is going to leave you vulnerable to illness. It actually affects the ability of the body to fight off, you’ve heard people say oh, you’ve got a cold, well remember last week what happened, you really, that was a terrible situation you were in.
2. Latent Semantic Analysis: The LSA method gives the following answer. As you can see the technique searched for underlying meaning (semantics) in the title.