Skip to content

machinelearningZH/simply-simplify-language-glossarizer

Repository files navigation

Simply Create Glossaries in Plain Language

Create Plain Language Glossaries from texts, URLs, and files using LLMs.

GitHub License PyPI - Python GitHub Stars GitHub Issues GitHub Issues Current Version linting - Ruff

Contents

Features

  • Automated glossary creation: Generate glossaries that simplify complex terms and concepts to Plain Language in German.
  • Multiple input sources: Process text from direct text input, URLs, or .txt file uploads.

Installation

To install the project and its dependencies:

git clone https://github.com/machinelearningZH/simply-simplify-language-glossarizer.git
cd simply-simplify-language-glossarizer

pip3 install uv
uv venv
source .venv/bin/activate
uv sync

The app uses the OpenAI LLM API. Create an API key and save it to an .env file. At the moment, the app assumes that your key is saved to .env_example.

Running the App

Start the Streamlit app:

cd _streamlit_glossarizer
streamlit run home.py

The app will be available at http://localhost:8501/.

Warning

Be cautious with sensitive data as mentioned in the app warning. Use only for public, non-sensitive information as processing occurs on third-party servers.

Note

Currently, only .txt files are supported by the file upload feature.

Project Information

Government agencies and other organisations in Switzerland are increasingly recognising the need to make complex documents and specialized terminology more accessible to the general public. This tool helps address that need by generating glossaries that explain difficult terms in Plain Language.

  • Users can create simplified language glossaries from various sources.
  • The tool uses large language models (LLMs) to identify complex terms and generate clear, accessible explanations.
  • It produces a draft glossary that should always be reviewed and refined by users before publication.
  • For each term, the app provides two types of explanations:
    • One with context, based on the surrounding source text.
    • One without context, based solely on the LLM’s general knowledge. Contextual explanations are typically more accurate and relevant, but general explanations may still be useful in some cases.
  • This tool contributes to the broader goal of making government communication more understandable and inclusive for all citizens.

Project Team

Feedback and Contributing

We welcome feedback and contributions. Email us or open an issue or pull request.

We use Ruff for linting and code formatting.

Install pre-commit hooks for automatic checks before opening a pull request:

pre-commit install

License

This project is licensed under the MIT License. See LICENSE for details.

Disclaimer

This software (the Software) has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.

About

Create Plain Language glossaries from texts, URLs, and files using LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages