This project is a Python-based CLI tool that allows users to search for research papers from the PubMed database based on a user-specified query. It filters the results to include only papers with at least one author affiliated with a pharmaceutical or biotech company and saves the output to a CSV file.
Entrez
.
--debug
, --help
and --file
.
- Python 3.9+ – Download
- Biopython – For fetching data from the PubMed API using
Entrez
→ Biopython Documentation - Pandas – For handling and processing data → Pandas Documentation
- Poetry – For managing project dependencies → Poetry Documentation
- argparse – For handling command-line arguments → argparse Documentation
- ChatGPT – Used for generating code, debugging, and improving documentation → ChatGPT
- Git - For version control → Git Documentation
Open your terminal and run:
git clone https://github.com/your-username/research-paper-by-using-pubmed-api.git
cd research-paper-by-using-pubmed-api
Open your terminal and run:
pip install poetry
Set up the environment using Poetry:
poetry install
If you have multiple Python versions installed:
poetry env use python3.10
##🧑🔧Configuration: Make sure to set your email in the script to comply with PubMed API guidelines:
Entrez.email = "your-email@example.com"
poetry run get-papers-list "cancer"
To save the output to a CSV file:
poetry run get-papers-list "AI in medicine" -m 20 -f results.csv
To debug and print detailed output:
poetry run get-papers-list "drug discovery" -d
To Display usage instructions:
poetry run get-papers-list "cancer" -h or --help
Edit
├── fetch_pubmed.py # Main script for fetching and processing papers
├── pyproject.toml # Poetry project configuration file
├── README.md # Project documentation
├── .gitignore # Ignore unnecessary files
└── results.csv # Output file (if specified)
Pubmed ID | Title | Publication Date | Non-academic Authors | Company Affiliations | Corresponding Author Email |
---|---|---|---|---|---|
12345678 | AI in Drug Discovery | 2025 | John Doe, Jane Smith | XYZ Pharma, ABC Biotech | mkaif0262@gmail.com |
Try running commands with sudo:
sudo poetry install
If you get encoding errors, convert files to UTF-8:
iconv -f WINDOWS-1252 -t UTF-8 README.md > README-converted.md
- Adherence to the problem statement – The program should strictly follow the problem statement and deliver the expected output.
- Ability to fetch and filter results correctly – The program should accurately fetch research papers and filter them based on pharma/biotech affiliations.
- Typed Python – Use type hints consistently for all functions and variables.
- Performance – Optimize API calls and processing to minimize execution time.
- Readability – Ensure clear and maintainable code with meaningful variable names, comments, and docstrings.
- Organization – Maintain logical separation of concerns using modular functions and classes.
- Robustness – Include error handling for invalid queries, API failures, and missing data to prevent crashes and improve user experience.