Skip to content

Extract disease data from PubTator Central for a given query (in this case, 'heart failure'). Link each disease with PMID and MESH ID for use in later projects.

Notifications You must be signed in to change notification settings

pinglab-intern/PTC_Data_Extraction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

PTC_Data_Extraction

Extract disease data from PubTator Central for a given query (in this case, 'heart failure'). Link each disease with PMID and MESH ID for use in later projects.

This Python script does the following:

  1. Obtains list of PMIDs from PubMed for a specific search query
  2. Converts that list into a format searchable in PubTator Central (PTC)
  3. Uses PTC API to get annotated articles (right now just 100 but should be able to do up to 1000) for PMID list. Output is in Pubtator Format.
  4. Writes data to csv file, then reads back in and parses into a data frame.

Requirements

This script requires:

Both may be installed through pip, e.g., pip install biopython pandas.

Running the script

Run this script as python '.\PMID to BioC Retrieval Using PubMed and PTC APIs.py'

By default, the script searches for all documents corresponding to the query "heart failure" - please change the string for the search query in the script.

Output is written to "output.csv".

The data frame is not saved but may be passed to another function.

Credits

Developed by Marlee Zinsser in the Ping Lab at UCLA while working with Harry Caufield in Fall 2019.

About

Extract disease data from PubTator Central for a given query (in this case, 'heart failure'). Link each disease with PMID and MESH ID for use in later projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%