Skip to content

pprzetacznik/patent-parsing-tools

Repository files navigation

patent-parsing-tools

USPTO patents dataset generator.

Documentation Status patent-parsing-tools CI PyPI version PyPI - Python Version

Documentation

Read the docs

System requirements

sudo yum install python-devel libxslt-devel libxml2-devel

Installation:

pip install patent-parsing-tools

Examples:

Downloading dataset:

python -m patent_parsing_tools.downloader \
  --directory dataset \
  --year-from 2010 \
  --year-to 2010

Collecting and serializing data:

python -m patent_parsing_tools.supervisor \
  --working-directory patents/working_directory \
  --train-destination patents/train_destination \
  --test-destination patents/test_destination \
  --year-from 2014 \
  --year-to 2015

Generating dictionary with train set:

python -m patent_parsing_tools.bow.dictionary_maker \
  --train-directory patents/train_destination \
  --max-patents 1000000000 \
  --dictionary dictionary.txt \
  --dict-max-size 4096

Generate bag of words with train set and test set:

python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/train_destination \
  --destination-directory patents/final_dataset_train \
  --dictionary dictionary.txt \
  --batch-size 1048576
python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/test_destination \
  --destination-directory patents/final_dataset_test \
  --dictionary dictionary.txt \
  --batch-size 1048576

Testing

pytest

Contributing and develpment

$ mkvirtualenv ppt
$ workon ppt
(ppt) $ pip install -r requirements.txt

Publish new release

$ git tag v1.0
$ git push origin v1.0

Building documentation

(ppt) $ sphinx-build -M html docs docs_build

References

Usage:

License

The MIT License (MIT). Copyright (c) 2014 Michał Dul, Piotr Przetacznik, Krzysztof Strojny. Check LICENSE files for more information.

About

USPTO patents dataset generator

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •