Cosmology-Based Question & Answer System
Ensure to install https://www.elastic.co/pt/elasticsearch/ and run as administrator, before executing the system;Used version==7.11.2 Reads and processes all the PDF files inside RawPapers. Outputs a TXT file into ProcessedData, with the text content of the PDF file. Reads and processes all the HTML pages inside RawWebPages. Outputs a TXT file into ProcessedData, with the text content of the web page. Ensures that a connection with ElasticSearch is established. Reads the saved content in ProcessedData, and for each passage, creates a JSON document which will be used to populate the DocumentStore. Deploys three off-the-shelf models: Pegasus (Question reformulator), BM25Retriever (Document retriever), RoBERTa (Document reader)