Skip to content

A refined open-source Python library originally developed by Johnson & Johnson's Advanced Analytics team. It utilizes existing libraries to perform common Natural Language Processing tasks.

License

Notifications You must be signed in to change notification settings

LookOutGap/LookOutGap-NLPAssistant

Repository files navigation

\n \n

\n\n# __LookOutGap-NLPAssistant__: Natural Language Processing Tool\n\n

\n \n \"\"/\n \n \n \"\"/\n \n

\n\nLookOutGap-NLPAssistant is a Python library that combines existing libraries for common Natural Language Processing tasks. It combines several existing open-source libraries such as pandas, spaCy, and scikit-learn to make a pipeline ready to process text data. There are many user-defined parameters depending on your type of project such as the ability to choose stemming or lemmatization. Or, you might want to define explicitly what to substitute with NaN text fields. Overall, it is a way to get you started in your NLP task, no matter what you need.\n\nA tutorial on how to use this package can be found [here](tutorial.ipynb).\n\n## Installation Instructions\n\n - Using pip with Python version 3.7 or higher:\n \```shell\n pip install LookOutGap-NLPAssistant\n \```\n - For more information on installing packages using pip, click [here](https://pip.pypa.io/en/stable/reference/pip_install/).\n\n## Contributing \n- To help develop this package, you'll need to install a conda virtual \nenvironment defined by our dev_environment.yml file using the below command.\n\n \```shell\n conda env create -f dev_environment.yml\n \```\n - Then, just activate the environment when attempting to develop or run tests \n using the below command.\n\n \```shell\n conda activate nlp_env\n \```\n\n - When you're all done developing or testing, just deactivate the environment \n with the below command.\n\n \```shell\n conda deactivate\n \```\n\n## Docker Configuration\n- This codebase is dockerized to build, run all of the unit tests using `pytest`, and perform pip packaging.\n - In order to run the docker container, ensure you have [Docker](https://www.docker.com/products/docker-desktop) \n installed and running on your local machine.\n - To start the docker container locally, simply navigate to the root of the \n project directory and type:\n \```shell\n docker-compose up --build\n \```\n - Note: `docker-compose` is included in the Docker desktop installation link \n above for MacOS and Windows based systems. If you have issues executing \n `docker-compose`, [Navigate Here](https://docs.docker.com/compose/install/) \n to ensure docker-compose is supported on your system.\n - A Notey-er note: You can use `docker-compose up --build` during development \n to quickly run the tests after code changes without setting up/running a local \n conda environment.\n\n## GitHub Action CI Configuration\n- Every commit to this repository will trigger a build in GitHub Actions following the \n.github/workflows/pythonapp.yml located in the root of this project.\n - GitHub Actions is used to build and lint the LookOutGap-NLPAssistant package, run the tests, and perform pip packaging.\n - If the environment name or version changes, the pythonapp.yml file will need to be updated to \n follow the new pattern.\n \n## Our Workflow\n- Our Methods and Tools\n - Style Guide - [PEP8 / pycodestyle](https://www.python.org/dev/peps/pep-0008/)\n - Git Strategy - [Git Flow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow)\n\n## Upcoming Features\nHere is a roadmap of features to be implemented in this package. If you have any\nideas for additional features, please let us know!\n\n* Preprocessing\n * Ability to use custom stop words\n * Incorporation of bi-grams\n * Ability for user to choose which language detection package to use\n* Vectorization\n * spaCy pre-trained models\n * spaCy custom models\n* Similarity Metrics\n * Additional pairwise distances\n * Levenshtein Distance\n * Word Mover's Distance\n* Visualizations\n * TF-IDF\n * Jaccard

About

A refined open-source Python library originally developed by Johnson & Johnson's Advanced Analytics team. It utilizes existing libraries to perform common Natural Language Processing tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published