Skip to content

ShreyasDankhade/pubmedapitask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubMed Research Paper Fetcher

Fetch research papers from PubMed API and filter authors affiliated with pharmaceutical/biotech companies.


📌 Features

✅ Fetches research papers using PubMed API
✅ Filters authors affiliated with pharma/biotech companies
✅ Saves results as CSV file
✅ Supports CLI arguments for flexibility
✅ Uses Poetry for dependency management


📦 Project Structure

pubmedapitask/
│── researchpapers/
│   ├── __init__.py
│   ├── main.py              # CLI script for fetching and processing papers
│   ├── pubmedapi.py         # Fetches data from PubMed API
│   ├── data_processing.py   # Extracts and filters authors from XML data
│── .gitignore
│── pyproject.toml           # Poetry dependencies and setup
│── README.md                # Documentation

🛠 Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/ShreyasDankhade/pubmedapitask.git
cd pubmedapitask

2️⃣ Install Poetry (If Not Installed)

pip install poetry

3️⃣ Install Dependencies

poetry install

4️⃣ Install Required Modules

Before running the project, ensure all dependencies are installed:

pip install -r requirements.txt

🚀 Usage

Run the script via Poetry:

Fetch papers and print output

poetry run get-papers-list "cancer immunotherapy"

Fetch papers and save as CSV

poetry run get-papers-list "diabetes research" -f research_results.csv

Enable Debug Mode

poetry run get-papers-list "genetic engineering" -d

🔧 Command-Line Arguments

Option Description
query (Required) Search term for fetching research papers
-f, --file Specify filename to save results as a CSV
-d, --debug Enable debug mode for detailed logs

⚙️ Running Inside Poetry Shell

If poetry run doesn't work, enter the Poetry virtual environment first:

poetry shell
get-papers-list "cancer immunotherapy" -f output.csv

📜 Example Output

PubmedID,Title,Publication Date,Authors with Pharma/Biotech Affiliations
123456, "Breakthrough in Cancer Research", 2024-03, "Dr. John Doe, Dr. Emily Smith"
789101, "Genetic Engineering in Medicine", 2023-11, "Dr. Alex Brown"

🔄 Updating the Project

To update dependencies:

poetry update

To pull latest changes from GitHub:

git pull origin main
poetry install

🔍 Troubleshooting

1️⃣ "ModuleNotFoundError"

If Poetry cannot find modules:

poetry install
poetry run get-papers-list "cancer research"

2️⃣ "get-papers-list: command not found"

Manually run the script:

poetry run python researchpapers/main.py "cancer research"

📌 Technologies Used

  • Python 3.9+
  • Poetry (Dependency Management)
  • Requests (HTTP Requests)
  • Pandas (Data Processing)
  • XML Parsing (PubMed Data Extraction)

📧 Contact

For questions or support, contact Shreyas Dankhade at shreyasdankhade75@gmail.com.


About

Fetch Research Papers using the PubMed API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages