A machine-learning based tool for extracting and analyzing scientific paper references. It parses the references in a bibliography (.bib) file and allows searching of referenced sentences contained in the introduction of any of the articles in the bibliography.
How it works:
- parses bibliography for article DOIs
- looks up the PMC identifiers for each of the DOIs
- parses the PMC html for Introduction/Main/Background section
- matches sentence and reference for each statement in the article's introdution section
- sentence fragments that are incomplete clauses, are reworded using Ollama to generate standalone sentences that best represent what is being stated in the article sentence.
- SentenceTransformers are used to compute the semantic embedding of each sentence which is compared to the user's search term by cosine similarity between the embeddings
FactYou uses a few machine learning libraries most of which can be installed with pip from the requirements.txt
. The exception to this is Ollama which must be installed by the user before FactYou can be run. Installation insctructions for Ollama on desktop can be found here.
# Clone the repository
git clone https://github.com/seanlaidlaw/FactYou.git
cd FactYou
# Install dependencies
pip install -e .
To launch the application run the module with python:
python -m factyu.main
This uses a persistent database stored in your user data directory in which it stores the extracted information from the bibliography files (.bib) passed to it.
The application will listen on 127.0.0.1:5000 by default. If port is already in use, a different port can be manually set from the command line argument:
python -m factyu.main --host 0.0.0.0 --port 80
The application stores data in a SQLite database. The default location is:
- Linux:
~/.local/share/FactYou/references.db
- macOS:
~/Library/Application Support/FactYou/references.db
- Windows:
C:\Users\<Username>\AppData\Local\FactYouApp\FactYou\references.db