Skip to content

Darthholi/similarity-models

Repository files navigation

Similarity models

Similarity and data extraction

Source codes to accompany the following publications:

Dataset is hosted under kaggle datasets:

All codes and datasets are published under the LICENSE attached (GNU AFFERO GENERAL PUBLIC LICENSE). Any derivative work or service should be published under the same license (for any other licensing options, feel free to reach out).

We kindly ask you to cite the abovementioned papers in Your reserach. Or your "Thank You" page together with the author's name (https://www.linkedin.com/in/martin-holecek/) and a link to https://rossum.ai/.

Testing baseline command (note the limit and n_epochs params):

experiments_ft.py --verbose=1 --sqlite_source="article_anon_a.sqlite" --neighbours=1 --debug=True
--cls_extract_types="['amount_total', 'amount_total_base', 'amount_total_tax', 'amount_rounding', 'amount_paid', 'amount_due', 'tax_detail_base', 'tax_detail_rate', 'tax_detail_tax', 'tax_detail_total', 'account_num', 'bank_num', 'iban', 'bic', 'const_sym', 'spec_sym', 'var_sym', 'invoice_id', 'order_id', 'customer_id', 'date_issue', 'date_uzp', 'date_due', 'terms', 'sender_ic', 'sender_dic', 'recipient_ic', 'recipient_dic', 'sender_name', 'sender_addrline', 'recipient_name', 'recipient_addrline', 'page_current', 'page_total', 'phone_num']"
--weights_separate --key_metric=custom --key_metric_mode=max --n_epochs=2 --limit=400

For running the same experiments as were in the article, the commands in experiments.txt were used.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages