Run one or many PubMed queries and get EndNote XML files plus a summary CSV — nothing more, nothing less. Configure a few settings, point to your queries, and run.
- Create env (uv):
uv venv && . .venv/bin/activate && uv pip install biopython
- Optional linting:
uv pip install ruff
Create inputs/config.toml
. Use the example at inputs/example/config.example.toml
as a template.
Example inputs/config.toml
:
email = "your.email@example.com"
retmax = 100
# date_start = "2015" # YYYY or YYYY/MM/DD
# date_end = "2025/08/21"
# queries = [
# "BRCA1[Title/Abstract] AND breast cancer",
# "EGFR AND (lung[Title/Abstract])",
# ]
queries_dir = "inputs/queries" # or point to any folder of .txt queries
- Normal:
python main.py
(requiresinputs/config.toml
) - Demo:
python main.py --demo
(runs queries ininputs/example/queries/
)
No other CLI flags are supported; all settings come from inputs/config.toml
. If no config is found, the tool will suggest running with --demo
.
- Write each query as plain text using PubMed’s standard search syntax — the same syntax used on the PubMed website and E-utilities.
- You can use Boolean operators (
AND
,OR
,NOT
), parentheses for grouping, phrases in quotes, and field tags like[Title/Abstract]
,[MeSH Terms]
,[Author]
, etc. - Examples:
EGFR AND lung
"BRCA1"[Title/Abstract] AND breast cancer
(melanoma[Title/Abstract]) AND (checkpoint inhibitors[MeSH Terms])
- Learn more: PubMed User Guide — Search (search tags, fields, examples): https://pubmed.ncbi.nlm.nih.gov/help/#search-tags
Quick tip: If you run a search on PubMed manually, you can see what query your search translated to by going to Advanced Search History, and opening the Details dropdown on your search.
All outputs go under outputs/<timestamp>/
:
results_<name>.xml
: EndNote XML for each querysummary.csv
: columnsname,count,query
- Set a valid email (config or
ENTREZ_EMAIL
) to comply with NCBI usage policy. - Use small
--retmax
for smoke tests; large values create big XML files.
- Lint/format:
ruff format
andruff check --fix
- Run tests (stdlib discovery):
python -m unittest discover -s tests -p 'test*.py'
main.py
: simple entrypointsrc/query_papers/
: minimal CLI and PubMed helpersinputs/
: your config and queries (user-editable)outputs/
: all results per run, timestampedtests/
: isolated dev-only tests (safe to ignore if not developing)