Concept Extraction (ConExion)

Overview

This repository supports the research presented in the paper:

ConExion: Concept Extraction with Large Language Models
Ebrahim Norouzi, Sven Hertling, Harald Sack (2025)

If you use this code or ideas from this work, please consider citing our paper (see below).

This project contains code and resources for extracting concepts using unsupervised methods and large language models (LLMs). It includes setup instructions, scripts for running the models, and a brief guide on how to get started.

Setup the Environment

To set up the environment, follow these steps:

Create and activate the conda environment:

conda env create -f environment.yml
conda activate conexion

Install the required Python packages:
```
pip install -r requirements.txt
```

Running the Scripts

To run the provided scripts, use the following command:

nohup ./run_scripts.sh > logs/master_log.log 2>&1 &

Keyword Extraction Methods

Unsupervised Methods

Unsupervised keyword extraction methods rely on statistical and linguistic features of the text. These methods do not require labeled data. Common techniques include:

TF-IDF: Weighs the importance of a term by comparing its frequency in a document to its frequency across all documents.
TextRank: An algorithm inspired by PageRank, where words are nodes, and edges represent co-occurrence within a fixed window. Key phrases are identified by their importance in the network.
LDA: A generative statistical model that identifies topics in a set of documents, which can then be used to extract relevant keywords.

Large Language Models (LLMs)

Large language models can understand and generate human-like text. LLMs can be fine-tuned for concept extraction on specific datasets or used in zero-shot settings.

File Structure

batch_submit.sh                  Script for submitting batch jobs
batch_test.sh                    Script for testing batch jobs
download_models.sh               Script for downloading models
main.py                          Main script for running the concept extraction
requirements.txt                 Python package dependencies
run_fs_fixed_LLM-batch.sh        Script for running fixed LLMs in batch mode
run_fs_fixed_LLM-job.sh          Script for running fixed LLM jobs
run_fs_fixed_LLM-scripts.sh      Script for running fixed LLM scripts
run_fs_fixed_LLM_all_datasets.sh Script for running fixed LLMs on all datasets
run_scripts.sh                   Main script for running all other scripts
run_zs_LLM_scripts.sh            Script for running zero-shot LLM scripts
run_zs_fixedLLMscripts.sh        Script for running zero-shot fixed LLM scripts

Citation

If you use this work in your research, please cite it as:

@misc{norouzi2025conexionconceptextractionlarge,
  title     = {ConExion: Concept Extraction with Large Language Models},
  author    = {Ebrahim Norouzi and Sven Hertling and Harald Sack},
  year      = {2025},
  eprint    = {2504.12915},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url       = {https://arxiv.org/abs/2504.12915}
}

License

This project is licensed under the MIT License – see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Concept Extraction (ConExion)

Overview

Setup the Environment

Running the Scripts

Keyword Extraction Methods

Unsupervised Methods

Large Language Models (LLMs)

File Structure

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
conexion		conexion
logs		logs
output		output
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_data.py		analyze_data.py
batch_job_1gpu.sh		batch_job_1gpu.sh
batch_job_2gpu.sh		batch_job_2gpu.sh
batch_job_3gpu.sh		batch_job_3gpu.sh
batch_job_test.sh		batch_job_test.sh
batch_submit.sh		batch_submit.sh
batch_test.sh		batch_test.sh
download_models.sh		download_models.sh
environment.yml		environment.yml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_fs_fixed_LLM-batch.sh		run_fs_fixed_LLM-batch.sh
run_fs_fixed_LLM-job.sh		run_fs_fixed_LLM-job.sh
run_fs_fixed_LLM-scripts.sh		run_fs_fixed_LLM-scripts.sh
run_fs_fixed_LLM_all_datasets.sh		run_fs_fixed_LLM_all_datasets.sh
run_fs_fixed_LLM_transfer-batch.sh		run_fs_fixed_LLM_transfer-batch.sh
run_fs_fixed_LLM_transfer-job.sh		run_fs_fixed_LLM_transfer-job.sh
run_scripts.sh		run_scripts.sh
run_zs_LLM_scripts.sh		run_zs_LLM_scripts.sh
run_zs_fixedLLMscripts.sh		run_zs_fixedLLMscripts.sh

License

ISE-FIZKarlsruhe/concept_extraction

Folders and files

Latest commit

History

Repository files navigation

Concept Extraction (ConExion)

Overview

Setup the Environment

Running the Scripts

Keyword Extraction Methods

Unsupervised Methods

Large Language Models (LLMs)

File Structure

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages