Create Plain Language Glossaries from texts, URLs, and files using LLMs.
Contents
- Automated glossary creation: Generate glossaries that simplify complex terms and concepts to Plain Language in German.
- Multiple input sources: Process text from direct text input, URLs, or .txt file uploads.
To install the project and its dependencies:
git clone https://github.com/machinelearningZH/simply-simplify-language-glossarizer.git
cd simply-simplify-language-glossarizer
pip3 install uv
uv venv
source .venv/bin/activate
uv sync
The app uses the OpenAI LLM API. Create an API key and save it to an .env
file. At the moment, the app assumes that your key is saved to .env_example
.
Start the Streamlit app:
cd _streamlit_glossarizer
streamlit run home.py
The app will be available at http://localhost:8501/
.
Warning
Be cautious with sensitive data as mentioned in the app warning. Use only for public, non-sensitive information as processing occurs on third-party servers.
Note
Currently, only .txt files are supported by the file upload feature.
Government agencies and other organisations in Switzerland are increasingly recognising the need to make complex documents and specialized terminology more accessible to the general public. This tool helps address that need by generating glossaries that explain difficult terms in Plain Language.
- Users can create simplified language glossaries from various sources.
- The tool uses large language models (LLMs) to identify complex terms and generate clear, accessible explanations.
- It produces a draft glossary that should always be reviewed and refined by users before publication.
- For each term, the app provides two types of explanations:
- One with context, based on the surrounding source text.
- One without context, based solely on the LLM’s general knowledge. Contextual explanations are typically more accurate and relevant, but general explanations may still be useful in some cases.
- This tool contributes to the broader goal of making government communication more understandable and inclusive for all citizens.
- Simone Luchetta — Staatskanzlei Zürich: Team Informationszugang & Dialog
- Chantal Amrhein, Patrick Arnecke — Statistisches Amt Zürich: Team Data
We welcome feedback and contributions. Email us or open an issue or pull request.
We use Ruff for linting and code formatting.
Install pre-commit hooks for automatic checks before opening a pull request:
pre-commit install
This project is licensed under the MIT License. See LICENSE for details.
This software (the Software) has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.