Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.

Official Implementation of the IEEE EUROCON 2025 Paper A Computational Approach to Modeling Conversational Systems Analyzing Large-Scale Quasi-Patterned Dialogue Flows Mohamed Achref Ben Ammar – National Institute of Applied Science and Technology (INSAT), University of Carthage, Tunisia Mohamed Taha Bennani – University of Tunis El Manar (FST)

License

Notifications You must be signed in to change notification settings

achrefbenammar404/quasi-patterned-conversations-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Computational Approach to Modeling Conversational Systems

Analyzing Large-Scale Quasi-Patterned Dialogue Flows

INSAT logo FST Logo

IEEE XPLORE logo

Official Implementation of the IEEE EUROCON 2025 Paper
A Computational Approach to Modeling Conversational Systems Analyzing Large-Scale Quasi-Patterned Dialogue Flows

Mohamed Achref Ben Ammar – National Institute of Applied Science and Technology (INSAT), University of Carthage, Tunisia
Mohamed Taha Bennani – University of Tunis El Manar (FST)


Abstract

The rise of large language models (LLMs) has led to increasingly complex and loosely structured dialogues. In this work, we introduce a computational graph-based framework that models these quasi-patterned conversations. Central to our approach is the Filter & Reconnect method, a graph simplification technique that reduces conversational noise while preserving semantic structure.

Key outcomes:

  • 2.06× improvement in semantic metric S over prior methods
  • 0 δ-hyperbolicity, enforcing a tree-like, interpretable structure

This framework offers practical tools for monitoring and analyzing chatbot behavior, dialogue management systems, and user interaction patterns at scale.


Methodology Overview

The methodology consists of the following core steps:

  1. Utterance Extraction
    Conversational utterances are extracted from a structured dataset consisting of multi-turn dialogues.

  2. Semantic Embedding
    Each utterance is transformed into a dense vector using a pre-trained text embedding model, capturing the semantic meaning of the message.

  3. Clustering of Intents
    Using hierarchical clustering techniques and a large language model (LLM), similar utterances are grouped together to identify key communicative intents.

  4. Markov Chain Construction
    A Markov Chain is built where nodes represent clustered intents and edges represent transitions between them in the dialogue flow.

  5. Graph Simplification: Filter & Reconnect
    The conversational graph undergoes a noise reduction process by removing irrelevant transitions while preserving semantic and structural coherence.

  6. Flow Pattern Analysis
    The resulting graph is then analyzed to identify quasi-patterned conversational flows, enabling improved interpretability and dialogue system evaluation.


Setup

1. Install Dependencies

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate        # Linux/MacOS
venv\Scripts\activate           # Windows

# Install required packages
pip install -r requirements.txt

# Download required NLP model
python -m spacy download en_core_web_md

2. Create a .env File

At the project root, create a .env file and configure the following environment variables:

# Python setup
PYTHONPATH=${PYTHONPATH}:.

# Environment mode
ENVIRONMENT="local"

# API Keys
GOOGLE_API_KEY=
MISTRAL_API_KEY=

Ensure your API keys are valid and have the appropriate access privileges.


Input Data Format

This framework supports ABCD v1.1, MultiWOZ 2.0, or any custom dataset formatted as follows:

{
  "conversation_1": [
    {"role": "agent", "content": "Hello, how can I help you today?"},
    {"role": "customer", "content": "I need assistance with my account."},
    {"role": "action", "content": "Agent opened account details."}
  ]
}
  • Save your data file as: data/processed_formatted_conversations.json

Run the Pipeline

python main.py \
    --file_path data/processed_formatted_conversations.json \
    --num_sampled_data 500 \
    --min_clusters 10 \
    --max_clusters 30 \
    --model_name 'sentence-transformers/all-mpnet-base-v2' \
    --label_model 'open-mixtral-8x22b' \
    --tau 0.15 \
    --top_k 2 \
    --alpha 0.8

Advanced Configuration

Parameter Description Default
--num_sampled_data Number of conversations to sample 100
--min_clusters Minimum cluster count for elbow method 5
--max_clusters Maximum cluster count for elbow method 15
--model_name Sentence embedding model 'all-MiniLM-L12-v2'
--label_model LLM for labeling dialogue state clusters 'open-mixtral-8x22b'
--tau Minimum transition probability threshold 0.1
--top_k Number of outgoing edges to retain per node 1
--alpha Balance between semantic similarity and topology 1.0

Citation

If you use this codebase for your research, please cite:

@inproceedings{achref2025conversationalgraph,
  title={A Computational Approach to Modeling Conversational Systems: Analyzing Large-Scale Quasi-Patterned Dialogue Flows},
  author={Mohamed Achref Ben Ammar and Mohamed Taha Bennani},
  conference={IEEE EUROCON 2025 - The 21st International Conference on Smart Technologies},
  year={2025},
  publisher={IEEE},
}

Contact

For questions, collaborations, or feedback, feel free to reach out:

About

Official Implementation of the IEEE EUROCON 2025 Paper A Computational Approach to Modeling Conversational Systems Analyzing Large-Scale Quasi-Patterned Dialogue Flows Mohamed Achref Ben Ammar – National Institute of Applied Science and Technology (INSAT), University of Carthage, Tunisia Mohamed Taha Bennani – University of Tunis El Manar (FST)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages