Program SYnthesis for NLP
PsyNLP is a Python library, that intends to handle morphological inflections for any language in the form of an interpretable program. 🎉
- Installation Guidelines
- Running the scripts
- Visualizing a formal concept
- Repository structure
- Running the tests
- Contribution Guidelines
- License
- Installing from PIP
$ pip3 install psynlp- Setting up locally
- Clone the repository
$ git clone git@github.com:Demfier/PsyNLP.git- Go to the cloned repository
$ cd PsyNLP- Install the dependencies
$ pip3 install -r requirements.txtAlternatively, you can also install the module from pip directly using the command:
pip3 install psynlp
With the power of argparse, the main.py acts as the central script to run any of the pipelines, for any language and training data quality.
- Help menu, for more details:
$ python3 main.py -husage: main.py [-h] [-p PIPELINE] [-l LANGUAGE] [-q QUALITY] [-v]
Runs one of the pipeline scripts, for a given language and quality.
optional arguments:
-h, --help show this help message and exit
-p PIPELINE, --pipeline PIPELINE
Name of the pipeline file (Default: deterministic)
-l LANGUAGE, --language LANGUAGE
Name of the language (Default: english)
-q QUALITY, --quality QUALITY
Size of the training data (Default: low)
-v, --verbose Prints verbose output if specified
- Running a pipeline (say, ostia) for a language (say, polish) and training data quality (say, high):
$ python3 main.py -p ostia -l polish -q high- Get more output debug-like details with verbose flags (max. 3)
# No verbose, just print the exact word-match accuracy
$ python3 main.py
# Verbose 1, print the expected and actual words
$ python3 main.py -v
# Verbose 2, print the paths responsible for computing an inflection
$ python3 main.py -vv
# Verbose 3, print debug details for PAC and OSTIA
$ python3 main.py -vvvThe cytoscape library has been used to visualize a formal concept. A sample notebook showing the visualization can be seen here.
# Show files in the visual/ directory
$ ls visual/|
|_ cytoscape.tmpl (The html template file)
|_ style.cycss (The cytoscape css file)
- Before running the notebook:
# Go to visual/ directory
$ cd visual/
# Start the HTTP server on port 8000, from the visual/ directory
$ python3 -m http.server -p 8000
- Now, run the notebook from the root directory (Do a
cd ~/..if required):
# Open the jupyter notebook
$ jupyter notebookAn interactive plot with zoom, search and filter features should appear on your visualize.ipynb notebook. If you'd like a html file, you'll also be able to see a sample.html and sample.json generated in the visual/ directory.
-
Base classes:
The code for base classes can be found in the
psynlp/coredirectory.fca.py: Contains implementations of PAC and other methods related to Formal Concept Analysisfst.py: Contains generic Transducer methods, like states and arcsoracle.py: Contains the oracles that're used while computing the PAC basis infca.pyostia.py: Implementation of the well-known OSTIA algorithm, that usesfst.py
-
Pipelines:
The code for the different pipelines can be found in the
psynlp/pipelinesdirectory.deterministic.py: Prediction based on Pandas'group_by(deterministic clustering) and OSTIA RegExp matchingostia.py: Prediction based on just the input-output tapes of OSTIApac_ostia.py: Prediction based on PAC clusters and OSTIA RegExp matching
-
Helpers:
The code for the different helpers can be found in the
psynlp/helpersdirectory.builtins.py: Monkey-patches some required verbose-related builtin functionsimporters.py: Includes functions that imports training and testing data into different structuresmisc.py: Miscellaneous functionstext.py: Text-related functions such as inflecting, prefix, suffix, edit distance, etc.
-
Data:
The
psynlp/datadirectory contains all the training and testing data. The files are of the form:- {language}-train-{quality}
- {language}-dev
- Basic run to check the results:
py.test- For debugging:
py.test -s --fulltraceYour contributions are always welcome! Please have a look at the contribution guidelines first. 🎉
MIT License 2018 - Gaurav Sahu and Athitya Kumar. For more information, please have a look at the LICENSE file.
