Create a text corpus with the python library of job interview dialogues containing general questions and answers and provide some examples of statistical analysis from the corpus.
To use the corpus first download or clone the file in you device.
https://github.com/1dipesh/Job-interview-corpus.git
You have to install pandas and NLTK python library to use this corpus.
pip install pandas
pip install nltk
You can open the CodeExample.ipynb where 10 functions are provided which are explained below:
- raw_text : It returns a raw corpus text of the data we have in the corpus
- raw_question_answer : It returns a dataframe of raw corpus text where each row represents a question and its answer.
- raw : It returns the unprocessed corpus contents of the corpus.
- words : It returns every word in the corpus in the form of a list.
- sents : It returns the sentence in the corpus in the form of a list.
- tagged_words : It returns the word of the corpus along with it’s POS tag in the form of a tuple. i.e. (word, POS tag)
- tagged_sents : It returns the word of the corpus of a single sentence with it’s POS tag in the form of tuple.
- parsed_sents : It returns the parsed sentence of the corpus.
- dependency_parsed_sents : It returns the dependency parsed sentences of the corpus.
- visualize_parsed_tree : It returns the parsed tree of the parsed sentences.