Skip to content

bhklab/query_papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple PubMed Query Runner

Run one or many PubMed queries and get EndNote XML files plus a summary CSV — nothing more, nothing less. Configure a few settings, point to your queries, and run.

Quickstart

  • Create env (uv): uv venv && . .venv/bin/activate && uv pip install biopython
  • Optional linting: uv pip install ruff

Configure (config-only)

Create inputs/config.toml. Use the example at inputs/example/config.example.toml as a template.

Example inputs/config.toml:

email = "your.email@example.com"
retmax = 100
# date_start = "2015"         # YYYY or YYYY/MM/DD
# date_end = "2025/08/21"
# queries = [
#   "BRCA1[Title/Abstract] AND breast cancer",
#   "EGFR AND (lung[Title/Abstract])",
# ]
queries_dir = "inputs/queries"  # or point to any folder of .txt queries

Run

  • Normal: python main.py (requires inputs/config.toml)
  • Demo: python main.py --demo (runs queries in inputs/example/queries/)

No other CLI flags are supported; all settings come from inputs/config.toml. If no config is found, the tool will suggest running with --demo.

Query Syntax

  • Write each query as plain text using PubMed’s standard search syntax — the same syntax used on the PubMed website and E-utilities.
  • You can use Boolean operators (AND, OR, NOT), parentheses for grouping, phrases in quotes, and field tags like [Title/Abstract], [MeSH Terms], [Author], etc.
  • Examples:
    • EGFR AND lung
    • "BRCA1"[Title/Abstract] AND breast cancer
    • (melanoma[Title/Abstract]) AND (checkpoint inhibitors[MeSH Terms])
  • Learn more: PubMed User Guide — Search (search tags, fields, examples): https://pubmed.ncbi.nlm.nih.gov/help/#search-tags

Quick tip: If you run a search on PubMed manually, you can see what query your search translated to by going to Advanced Search History, and opening the Details dropdown on your search.

Outputs

All outputs go under outputs/<timestamp>/:

  • results_<name>.xml: EndNote XML for each query
  • summary.csv: columns name,count,query

Notes

  • Set a valid email (config or ENTREZ_EMAIL) to comply with NCBI usage policy.
  • Use small --retmax for smoke tests; large values create big XML files.

Dev: Lint/Format and Tests

  • Lint/format: ruff format and ruff check --fix
  • Run tests (stdlib discovery): python -m unittest discover -s tests -p 'test*.py'

Project Layout

  • main.py: simple entrypoint
  • src/query_papers/: minimal CLI and PubMed helpers
  • inputs/: your config and queries (user-editable)
  • outputs/: all results per run, timestamped
  • tests/: isolated dev-only tests (safe to ignore if not developing)

About

Simple PubMed Query Runner

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages