Tuesday, November 27, 2018

QUESTION ANSWER BOT

Many people often get confused between Chat Bot and Question Answer Bot. They think that both are same or both use same methodology but both are not same, they are different. 

We can say , Chat Bot is an agent or interface which allows user to interact with system and this service is either provided by rules or Artificial Intelligence(AI).

QA BOT or Question Answer Bot retrieves Answer of user's  Question from a set of predefined responses . This process can be based on Information Retrieval(IR) and natural language processing.


Just like real humans, virtual humans are provided with senses through advancements in the fields of text analysis. A lot of behaviour is determined by the things that happen around us and senses are the means which provide us with this information. What the exact relations are between these observations and the displayed behaviour is a complex problem especially if subtle differences have an influence. Humans are trained through experience to pick up on those subtle signals and display the appropriate behaviour automatically and without much thought. It makes us wonder whether a computer can do the same.
One Example of this is a “QA BOT” . QA BOT is an idea which can provide the best solution of each and every technical question. Main aim is to help customers so that they can get answers of all technical questions at one place and in easy manner.

To be able to get correct solution of the problem, various modules can be done.

1.  Information Retrieval & Dataset Creation
                        
  To identify correct solution of the Query, Dataset needs to be implemented. This is very important step to do. As if dataset would not be correct, our precision will also get affected.  In order to train and later apply the models to the QA data it is important to clean and prepare the data.

  • To create an effective dataset, we can use API of Stack Overflow website. With the help of that API, raw data is collected in the form of Questions and their Answers by Information Retrieval. That creates our dataset which has 2 columns. First column represents Question and next column is their Answer.

  •       Now, Raw data needs to be cleaned. So, several normalization steps are to be performed  removing capitalization, punctuation etc. Next the text is tokenized i.e., split into tokens. Each token is stemmed and lemmatized. Then, tokens are further divided into bigrams which are vocabulary of dataset.  



  •     While extracting Question Answer from Stack Overflow website, if any Ques ID doesn’t exist, then that id will not be considered.

2. Training and Testing
                    
                   After creating and cleaning the dataset, next phase is training of dataset.
  • All first column of data i.e., Questions are trained.
  • Term Frequency matrix of whole corpus is generated using the formula tfidf(t,d,D)=tf(t,d)xidf(t,D)tfidf(t,d,D)=tf(t,d)x idf(t,D) where t denotes the terms; d denotes each document; D denotes the collection of  documents.
  • Then, Tf vector of corpus i.e., all Questions is to be formed using sklearn tfidf   vectorizer with our vocabulary of dataset.
  • Whenever a new Question is asked by the user, that Question/Query is cleaned like we had cleaned our dataset.
    • All steps i.e., capitalization removal, punctuation removal, white spaces removal, tokenization, stemming, lemmatization and creating bigrams is done on query data. After that, its Tf-vector is generated.
  • Then Cosine similarity is performed on both the vectors using sklearn package.
  • Documents are ranked on the basis of its cosine values. Top 2 documents which has the highest rank as well as the highest cosine scores is returned as the output. 


Here are some screenshots:


Screen to enter Query by User
     




User can write any technical Query




Top 2 relevant answers according to user's Query can be retrieved like this

No comments:

Post a Comment