Skip to content

angelalopezcardona/oasstetc

Repository files navigation

OASST-ETC Dataset

Code and data for the OASST-ETC (Open Assistant Eye Tracking Corpus) Dataset, which provides eye-tracking data for analyzing human reading patterns of Large Language Model responses. This repository contains the data and analysis code for our paper "Alignment Signals from Eye-tracking Analysis of LLM Responses".

Dependencies

  • See requirements.txt for full list of dependencies
  • Key packages include:
    • OpenCV
    • NumPy
    • Pandas
    • HuggingFace Hub
    • PyTorch

Installation

To install and run the project:

  1. Clone the repository
  2. Create a virtual environment
  3. Install the required dependencies from requirements.txt
  4. Install the tokenizer aligner package:
pip install git+https://github.com/anlopez94/tokenizer_aligner.git@v1.0.0
  1. Install the eyetrackpy package:
pip install git+https://github.com/anlopez94/eyetrackpy.git@v1.0.0

oasstetc_data

The oasstetc_data directory contains the following subdirectories:

- raw_data/
    * Contains raw eye tracking experiment data for each participant and trial
        - fixations.csv: Raw fixation data
        - vertices/word_cor_image_fixations_trial.csv: Fixations mapped to words

- gaze_features_real/
    * Contains reading measures averaged across participants for each trial and user set
        - set_[N]/word_cor_image_fixations_trial.csv: Processed reading measures

- gaze_features_synthetic/ 
    * Contains synthetic reading measures generated by a generative model for each trial and user set
        - set_[N]/word_cor_image_fixations_trial.csv: Generated reading patterns

- attention/
    * Contains attention pattern analysis
        - Attention patterns broken down by trial and user set
        - Statistical correlations between attention and reading measures

- attention_reward/
    * Contains reward model attention analysis
        - Attention patterns by trial/user set with reward model configurations
        - Correlations between attention and reading measures

process_et_data

- Main functionality for processing eye tracking data and generating reading measures
- Contains classes for handling text and image-based eye tracking data

## Assign fixations to words:

For each user and session, we have:
* 1 file with fixation data: one row for each fixation. We read X, Y coordinates to position it on the screen. We use features such as the number of seconds and pupil dilation.

* N images, one for each trial (response to a prompt). Each image has a name indicating the user, session, and trial. Each image displays the user's response. We use these images to obtain the X, Y coordinates for each word.

* N CSV files, one for each trial. It contains the X, Y coordinates for each trial. Initially used to obtain word coordinates, but later it was decided to use images.

* 1 file with trial data: one row for each trial. We obtain the original text and user ratings.

The class EyeTrackingDataText(EyeTrackingData) (in the file eye_tracking_data.py) is used to obtain word coordinates from CSV with X, Y coordinates for each character. It was an initial approach, but later it was decided to use images.

The class EyeTrackingDataImage(EyeTrackingData) (in the file eye_tracking_data_image.py) reads fixation and image data. From images with OCR, we obtain word coordinates. The class also includes an algorithm to assign fixations to words. Once we have word coordinates (two on the X-axis and two on the Y-axis), we assign fixations, considering the minimum distance of each fixation to each word and the assignment of the previous fixation. The output is an CSV file with one row per word (or set of words, depending on how OCR extracted them) and one column per feature (number of fixations, sum of fixations in seconds, mean of fixations in seconds, pupil dilation).

* It allows to extract words from images with OCR, read fixations files, assign fixations to words, and save the results in a CSV file.
* It allows plotting images with prompts and fixations assigned to each word.

run `python process_et_data/main_asign_fixations.py` to process the raw eye tracking data,  assign fixations to words and compute reading measures per word trial and  per user

run `python process_et_data/main.py` to compute reading measures per user set (1 to 8)
    (EyeTrackingAnalyser().average_gaze_features_real_participants(path, path_save))

generate_syntethic_readingmeasures

Code to generate synthetic fixations for the text used in the lab and then be able to compare it. 
run `python generate_syntethic_readingmeasures/main_generate_synthetic_fixations.py` to generate the synthetic fixations

analyse_reading_measures

Code to analyse the reading measures obtained from the eye tracking data.

This module analyzes reading measures extracted from eye-tracking data. It includes processing real and synthetic eye-tracking metrics, computing statistical comparisons, and visualizing results.

Features of the scripts:
- Processes eye-tracking data to compute reading measures, such as fixation duration, first fixation duration, and fixation count.
- Compares reading metrics between preferred and non-preferred conditions.
- Computes statistical tests (paired) to assess differences between conditions.
- Supports real and synthetic data analysis.
- Synchronizes physiological data (EDA) with eye-tracking data to analyze correlations.
- Produce boxplot figures as seen in Figure 3/4/5. 

run `python analyse_readingmeasures/Figure_3_Total_words_and_length_by_condition.py` to generate Figure 3 (Total words and length by condition)
run `python analyse_readingmeasures/Figure_4_and_5_Reading_measures_Sythetic_and_organic_data.py` to generate Figure 4/5 (Reading measurements in synthetic and organic data)
run `analyse_readingmeasures/Table_7_EDA_analysis.py` to generate table 7 (EDA analysis)

analyse_attention

## Compute attention of each model:
- In this folder we can compute the attention of different transformer models and compare it with the reading measures.

- Computing attention per models and per layer:
    run `python analyse_attention/main_compute_attention.py` to compute attention patterns across all models and layers.

    Key features:
    - Computes attention scores for each trial and layer across all configured models
    - Supports both standard attention computation and reward model attention computation
    - For reward model attention (run with `--reward=True`), computes attention on combined prompt+response
    - Reward model attention currently supported for:
        * openbmb/UltraRM-13b
        * openbmb/Eurus-RM-7b  
        * nicolinho/QRM-Llama3.1-8B

    Configuration:
    - Models can be enabled/disabled in the models dictionary in main_compute_attention.py
    - Results are saved per layer as CSV files in:
        * Standard attention: attention/model_name/results/set_n/trial_XX/layer_X.csv
        * Reward attention: attention_reward/model_name/results/set_n/trial_XX/layer_X.csv

    Example usage:
    ```bash
    # Standard attention computation
    python analyse_attention/main_compute_attention.py

    # Reward model attention computation 
    python analyse_attention/main_compute_attention.py --reward=True
    ```

- Comparing attention with reading measures:
    This script analyzes correlations between model attention patterns and human reading measures.

    Usage:
    ```bash
    # Basic usage
    python analyse_attention/main_compute_compare_trials.py

    # Filter for unanimous responses only
    python analyse_attention/main_compute_compare_trials.py --filter_completed=True

    # Use reward model attention patterns
    python analyse_attention/main_compute_compare_trials.py --folder_attention=attention_reward
    ```

    Key features:
    - Computes correlations between model attention and reading measures (fixation duration, count etc.)
    - Can analyze either all responses or only unanimous ones via --filter_completed flag
    - Supports both standard attention and reward model attention via --folder_attention flag
    - Saves results separately for chosen and rejected responses
    - Results saved to:
        * Standard attention: attention/modelname/results/[chosen|rejected]/
        * Reward attention: attention_reward/modelname/results/[chosen|rejected]/

    Configuration:
    - Models can be enabled/disabled in the models dictionary in the script
    - Available reading measure can be configured in the gaze_features list
    - For reward models, only UltraRM-13b, Eurus-RM-7b and QRM-Llama3.1-8B are supported


- Plotting attention-gaze correlations by model layer:
    ```bash
    python analyse_attention/main_plot_attention_layers.py
    ```

    Key features:
    - Visualizes correlations between model attention and reading measures for each layer
    - Configurable options (directly on the file):
        * Model selection
        * Reading measures to analyze (e.g. fixation duration, count)
        * Attention source folder (standard 'attention' or reward 'attention_reward')
    - Generates per-layer correlation plots to analyze attention patterns at different model depths

- Plotting chosen vs rejected attention correlations:
    ```bash 
    python analyse_attention/main_plot_chosen_rejected.py
    ```

    Key features:
    - Compares attention-gaze correlations between chosen and rejected responses across all models
    - Configurable analysis options (directly on the file):
        * Filter for unanimous responses only
        * Select specific reading measure to analyze
    - Results loaded from: attention/modelname/results/[chosen|rejected]/
    - Generates comparative visualizations to analyze attention differences between chosen/rejected responses

- Plotting attention-gaze correlations for reward models:
    ```bash
    python analyse_attention/main_plot_chosen_rejected_reward.py
    ```
    Key features:
    - Compares attention-gaze correlations between chosen and rejected responses for reward models
    - Analyzes both standard attention (response only) and reward attention (prompt + response)
    - Supported reward models:
        * openbmb/UltraRM-13b
        * openbmb/Eurus-RM-7b 
        * nicolinho/QRM-Llama3.1-8B
    
    Configuration (directly on the file):
        - Filter for unanimous responses via completed/not_completed options
        - Select specific reading measures to analyze
    - Results loaded from:
        * Standard attention: attention/modelname/results/[chosen|rejected]/
        * Reward attention: attention_reward/modelname/results/[chosen|rejected]/

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{Lopez-Cardona2025OASST,
  title     = {{OASST-ETC} Dataset: Alignment Signals from Eye-tracking Analysis of {LLM} Responses},
  author    = {L{\'o}pez-Cardona, {\'A}ngela and Idesis, Sebasti{\'a}n and Barreda-Ángeles, Miguel and Abadal, Sergi and Arapakis, Ioannis},
  booktitle = {Proceedings of the 2025 ACM Symposium on Eye Tracking Research \& Applications ({ETRA})},
  year      = {2025},
  location  = {Tokyo, Japan},
  publisher = {ACM},
  address   = {New York, NY, USA},
  note      = {Presented at ETRA 2025, May 26--29, 2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages