HERMES Pipeline

HERMES tool was developed to improve the start of the art of topic modelling. HERMES integrates various text processing stages, including acronym detection and expansion, equivalence detection, and topic labelling. The pipeline leverages Large Language Models (LLMs) and linguistic preprocessing to produce semantically enriched outputs.

Key Features

Acronym Detection and Expansion: Uses LLMs to detect acronyms within text and expand them to their full forms.
Preprocessing: Applies linguistic preprocessing steps such as lemmatization and optional embedding generation. This prepares the textual data for subsequent semantic analysis.
Equivalence Detection: Identifies semantically equivalent words that have been previously clustered and standardizes them to a canonical form. This process ensures a cleaner and more consistent vocabulary.
Topic Modeling Training: Trains various topic modeling approaches (e.g., MalletLda, Ctm, BERTopic) on the processed data. These models enable deeper semantic insights into the corpus.

Command-line Arguments of Script hermes_pipeline.py

--llm_type:
The type of large language model to use, e.g., "llama", "openai", "mistral", etc.
--data_path:
Path to the input data file.
--save_path:
Path to save intermediate and final output files.
--mode:
Pipeline mode ("optimized" or "non-optimized").
--do_train:
Indicates whether DSPy modules (e.g., for acronym detection/expansion and equivalences) should be trained.
--train_data_path:
Path to training data for DSPy modules.
--context_window, --max_windows, --window_overlap:
Parameters for windowing contexts in acronym detection/expansion.
--preproc_source, --lang, --spacy_model:
Parameters for preprocessing steps (e.g., language, spaCy model).
--source_eq, --times_equiv:
Parameters for equivalence detection (data source and number of iterations).
--num_topics, --num_iters, --model_type, --sample:
Parameters for training topic models (number of topics, iterations, model type, and sample size).

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
aux/tests		aux/tests
aux_files		aux_files
config		config
src		src
.gitignore		.gitignore
KPI_calculation.ipynb		KPI_calculation.ipynb
LICENSE		LICENSE
README.md		README.md
config.json		config.json
genera_tr_labeller_data.ipynb		genera_tr_labeller_data.ipynb
get_results_models.ipynb		get_results_models.ipynb
hermes_pipeline.py		hermes_pipeline.py
hermes_pipeline_simp.py		hermes_pipeline_simp.py
hermes_pipeline_v2.py		hermes_pipeline_v2.py
load_info_models.py		load_info_models.py
main.py		main.py
main_equivalences.py		main_equivalences.py
main_eval.py		main_eval.py
mergeObjectives.ipynb		mergeObjectives.ipynb
nlp_preprocess.py		nlp_preprocess.py
requirements.txt		requirements.txt
stopwords.ipynb		stopwords.ipynb
test_equivalences.ipynb		test_equivalences.ipynb
train_models.py		train_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HERMES Pipeline

Key Features

Command-line Arguments of Script hermes_pipeline.py

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

nextprocurement/RAG_tool

Folders and files

Latest commit

History

Repository files navigation

HERMES Pipeline

Key Features

Command-line Arguments of Script hermes_pipeline.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages