End-To-End ML with LLMs and Semantic Data Management: Experiences from Chemistry 4.0

This repository accompanies the paper:

End-To-End ML with LLMs and Semantic Data Management: Experiences from Chemistry 4.0
Sayed Hoseini, Vincent Herrmann, Christoph Quix
Hochschule Niederrhein University of Applied Sciences, Fraunhofer FIT
📅 DEEM ’25 — Workshop on Data Management for End-to-End Machine Learning

📄 Abstract

Machine Learning (ML) in industrial chemistry is often hindered by the complexity of preprocessing heterogeneous datasets. In this proof-of-concept study, we explore the use of semantic data manage- ment to support LLM-driven automation of end-to-end ML pipelines in a real-world Chemistry 4.0 setting. A semantic model is used to capture domain knowledge and metadata in a machine-readable form, guiding LLMs through natural language prompts to generate complete data wrangling and ML modeling code. We evaluate sev- eral state-of-the-art LLMs on their ability to autonomously produce functionally correct Python code for preprocessing and Gaussian Process modeling. Our results show that, when guided by struc- tured semantic context, larger LLMs can reliably generate accurate pipelines, significantly reducing the need for manual intervention. These findings provide an encouraging starting point for further exploration toward leveraging the semantic model to improve the robustness of code generation by systematically integrating rele- vant information into the generation process, rather than relying solely on the raw intelligence of the LLM.

We use a semantic model to provide structured metadata and guide LLMs (e.g., GPT-4, Gemini, LLaMA) via natural language prompts for code generation in data wrangling and Gaussian Process modeling. The results show that, with structured context, larger LLMs can generate functional pipelines with minimal human intervention.

🧪 Project Structure

DEEM/
├── abrasion.csv                   # Raw abrasion test data
├── compare_dataframes.py         # Utility for comparing processed DataFrames
├── data_points.txt               # Example data points for the inital prompt
├── evaluation/                   # LLM output evaluations
├── measurements/                 # Raw measurement data
├── prompts.ipynb                 # Jupyter notebook with prompts as helper for copy-pasting into chat window
├── target.csv                    # Target ML-ready dataset
├── Testing_for_Evaluation.ipynb  # Notebook for evaluating LLM outputs
├── SM.txt                        # Semantic model (data source definition)
└── viskos_means.csv              # Raw Viscosity measurements

If you use this work, please cite the following paper:

@inproceedings{10.1145/3735654.3735942,
author = {Hoseini, Sayed and Herrmann, Vincent and Quix, Christoph},
title = {End-To-End ML with LLMs and Semantic Data Management: Experiences from Chemistry 4.0},
year = {2025},
isbn = {9798400719240},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3735654.3735942},
doi = {10.1145/3735654.3735942},
articleno = {6},
numpages = {10},
keywords = {AutoML, Data Wrangling, LLMs, Semantic Data Management},
location = {Berlin, Germany},
series = {DEEM '25}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-To-End ML with LLMs and Semantic Data Management: Experiences from Chemistry 4.0

📄 Abstract

🧪 Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
evaluation		evaluation
image		image
measurements		measurements
LICENSE		LICENSE
README.md		README.md
RunStatistic.xlsx		RunStatistic.xlsx
SM.ttl		SM.ttl
Testing_for_Evaluation.ipynb		Testing_for_Evaluation.ipynb
abrasion.csv		abrasion.csv
compare_dataframes.py		compare_dataframes.py
data_points.txt		data_points.txt
prompts.ipynb		prompts.ipynb
target.csv		target.csv
target.ipynb		target.ipynb
viskos_means.csv		viskos_means.csv

License

hsnr-data-science/DEEM

Folders and files

Latest commit

History

Repository files navigation

End-To-End ML with LLMs and Semantic Data Management: Experiences from Chemistry 4.0

📄 Abstract

🧪 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages