Skip to content

cidm-ph/jentre

Repository files navigation

jentre

R-CMD-check

jentre is a client for the NCBI’s Entrez API.

The Entrez API has many quirks. jentre attempts to deal with those while presenting a convenient interface. It is designed for bulk metadata fetching and link traversal, though also provides full access to other parts of Entrez, albeit with fewer helpers.

Features of jentre:

  • Provides objects representing sets of Entrez identifiers to avoid mixing them up
  • Batches requests behind the scenes when needed
  • Based on {httr2} and {xml2}

The {rentrez} package is more mature and might suit a broader set of applications.

Installation

You can install the development version of jentre like so:

# development version
install.packages('jentre', repos = c('https://cidm-ph.r-universe.dev', 'https://cloud.r-project.org'))

Example

library(jentre)

# Searches by default use the Entrez history server for efficiency:
results <- esearch("Corynebacterium diphtheriae[orgn]", "biosample")
#> ℹ eSearch query "\"Corynebacterium diphtheriae\"[Organism]" has 4124 results
results
#> <entrez@/biosample[1]>
#> [1] MCID_68491.#1[4124]

# The returned object keeps track of which database the UIDs belong to,
# and whether the UIDs are local or on the history server. This makes
# them easier and less error-prone to use:
links <- elink(results, "sra")
links
#> # A tibble: 1 × 3
#>   from            linkname      to         
#>   <list>          <chr>         <list>     
#> 1 <idlst [4,124]> biosample_sra <wbhst [1]>

# You can pull a list of UIDs from the history server:
ids <- as_id_list(links$to[[1]])

# This is a vector with some extra metadata attached, but can be
# subsetted normally:
head(ids, n = 10)
#> <entrez/sra[10]>
#>  [1] 38889263 38768719 38768704 38641044 38501020 38428896 38428225 38427762
#>  [9] 38401944 38401943
as.character(ids[4:8])
#> [1] "38641044" "38501020" "38428896" "38428225" "38427762"

# For endpoints with richer data, you can provide an function to
# extract data you care about from the XML document. The output is
# combined if multiple API requests are needed, so you can end up
# with a single combined data frame, list, or vector:
esummary(
  ids[1:20],
  .process = function(doc) {
    xml2::xml_find_all(doc, "//DocumentSummary/CreateDate") |> xml2::xml_text()
  }
)
#>  [1] "2025/05/29" "2025/05/21" "2025/05/21" "2025/05/14" "2025/05/05"
#>  [6] "2025/04/30" "2025/04/30" "2025/04/30" "2025/04/29" "2025/04/29"
#> [11] "2025/04/29" "2025/04/29" "2025/04/29" "2025/04/29" "2025/04/29"
#> [16] "2025/04/29" "2025/04/29" "2025/04/21" "2025/04/21" "2025/04/21"

# If needed, you can construct an arbitrary request:
req <- entrez_request("esearch.fcgi", db = "nucleotide", term = "biomol+trna[prop]")
# You'll need to execute and parse it yourself:
httr2::req_perform(req) |> httr2::resp_body_xml()
#> {xml_document}
#> <eSearchResult>
#> [1] <Count>1012</Count>
#> [2] <RetMax>20</RetMax>
#> [3] <RetStart>0</RetStart>
#> [4] <IdList>\n  <Id>2737963026</Id>\n  <Id>2586967820</Id>\n  <Id>2274792564< ...
#> [5] <TranslationSet/>
#> [6] <TranslationStack>\n  <TermSet>\n    <Term>biomol+trna[prop]</Term>\n     ...
#> [7] <QueryTranslation>biomol+trna[prop]</QueryTranslation>

About

Entrez API client in R

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Languages