-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
DuckDB can read zipped tsv file directly from PatentsView bulk data download URLs using the new zipfs extension.
For example, using the DuckDB CLI:
INSTALL zipfs FROM community;
LOAD zipfs;
SELECT * FROM read_csv("zip://https://s3.amazonaws.com/data.patentsview.org/download/g_patent.tsv.zip/g_patent.tsv") LIMIT 5;
┌───────────┬─────────────┬─────────────┬───┬────────────┬───────────┬───────────────┐
│ patent_id │ patent_type │ patent_date │ … │ num_claims │ withdrawn │ filename │
│ int64 │ varchar │ date │ │ int64 │ int64 │ varchar │
├───────────┼─────────────┼─────────────┼───┼────────────┼───────────┼───────────────┤
│ 10000000 │ utility │ 2018-06-19 │ … │ 20 │ 0 │ ipg180619.xml │
│ 10000001 │ utility │ 2018-06-19 │ … │ 12 │ 0 │ ipg180619.xml │
│ 10000002 │ utility │ 2018-06-19 │ … │ 9 │ 0 │ ipg180619.xml │
│ 10000003 │ utility │ 2018-06-19 │ … │ 18 │ 0 │ ipg180619.xml │
│ 10000004 │ utility │ 2018-06-19 │ … │ 6 │ 0 │ ipg180619.xml │
├───────────┴─────────────┴─────────────┴───┴────────────┴───────────┴───────────────┤
│ 5 rows 8 columns (6 shown) │
└────────────────────────────────────────────────────────────────────────────────────┘
We could use this to simplify the duckdb read-in example.
Metadata
Metadata
Assignees
Labels
No labels