Fact Extraction

As part of GSOC-2019 we tried to enhance the the NLP capabilities of the project. To address this we trained a new NER model, using spaCy information on which, can be found in this segment of the wiki and added new fact extraction algorithms using regular expressions.

Concerning regular expressions we expanded the entities.py file and added several expressions that aim to extract useful information for a wide range of users. The new module can detect monetary and non-monetary amounts, scales conditions, constraints, dates, exact times, duration and various other specific codes and numbers such as:

URLs, e-mails, phone numbers, IBAN numbers, zip codes
CPC and CPV codes
IDs and Military IDs
Natura regions, Wildlife sanctuaries, NUTS regions
EU directives, regulations and decisions
Ship tonnage, hull numbers and ship flags
ISO and ELOT protocols
OPS, ADAs, AFM, KAEK numbers

Entity extraction is currently performed int the parser module. Each LawCodifier object contains an entities dictionary in the form of:

{'Urls': [], 'CPC Codes': [], 'CPV Codes': [], 'IBANs': [], 'E-mails': [], 'Id Numbers': [], 'Military Personel': [], 'Natura 2000 Regions': [], 'Scales': [], 'EU Directives': [], 'EU Regulations': [], 'EU Decisions': [], 'Phone Numbers': [], 'Protocols': [], 'AFM numbers': [], 'NUTS Region Codes': [], 'Exact times': [], 'Ship Tonnage': [], 'KAEK Codes': [], 'Hull': [], 'Flags': [], 'Monetary Amounts': [], 'Metrics': [], 'Conditions': [], 'Contraints': [], 'Durations': []}

Each entity extracted is the accompanied with a the number of the line it is found in the LawCodifier object.

You can access these dictionaries through the singleton variable for the codifier:

>>> codifier.codifier.laws[identifier].entities

Where identifier is the string corresponding to the law. For example 'ν. 4513/2018'.

Getting started
Algorithms
Datasets and Continuous Integration
- Fetching Documents
- Processing Documents
Documentation
- API Documentation
- RESTful API
Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fact Extraction

Fact Extraction

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally