YouLeQD: YouTube Learners' Questions on Bloom's Taxonomy Dataset

Introduction

YouTube Learners' Questions on Bloom's Taxonomy Dataset (YouLeQD) is a dataset that contains 57,242 learner-posed questions from YouTube educational lecture video comments. Along with the dataset, we developed two RoBERTa-based classification models leveraging Large Language Models to detect questions and analyze their cognitive complexity using Bloom's Taxonomy. This dataset and our findings provide valuable insights into the cognitive complexity of learner-posed questions in educational videos and their relationship with interaction metrics. This can aid in the development of more effective AI models for education and improve the overall learning experience for students.

Data

Source

Questions are extracted from the comments of these educational videos in the video list

Datasets

/data/IntVsDecl/: Interrogative Sentence Classification dataset; we have added GPT-4o annotations which were used in Knowledge Distillation training.
/data/transcripts/: Transcripts of the YouTube educational videos.
/data/comments/: Raw comments collected from the YouTube public domain.
/data/DASQBT/: Educational questions with the Bloom's taxonomy labels and its augmented dataset.
/data/stem_topics/: STEM topics used to generate educational questions.
/data/ext_questions/: Extracted questions from the comments.
/data/q_bt_human/: Human annotated Bloom's taxonomy labels for the extracted questions.

Methodology

Question Extraction

To distinguish questions from users' comments, we fine-tune a model on a publicly available dataset for interrogative sentence classification, which is built upon other datasets such as SQuAD and SPAADIA. The dataset comprises 211,168 sentences, including 80,167 non-interrogative sentences and 131,001 interrogative sentences,each with binary labels. The author of the dataset has deliberately removed question marks from some of the examples inorder to prevent the model from overfitting to the punctuation marks.To access the dataset, navigate to the file located at data/IntVsDecl/questions_vs_statements_v1.0.csv or click here for visit. scripts/q_extractor.py is used to extract questions from comments.

Bloom Taxonomy Levels of Questions extracted from Youtube Educational Video Comments

data/output/q_bt_pred.csv are predictions of Bloom Taxonomy Levels of Questions extracted from Youtube Educational Video Comments. scripts/bt_cls_roberta.py is the python script used for BT level prediction.

Citation

This paper was selected for presentation at the 19th International Conference on SEMANTIC COMPUTING (ICSC 2025) , held in Laguna Hills, California. If you find our work helpful or interesting, we would appreciate it if you consider citing it!

@inproceedings{nong2025YouLeQD,
  title     = "{YouLeQD: Decoding the Cognitive Complexity of Questions and
               Engagement in Online Educational Videos from Learners' Perspectives}",
  author    = "Nong Ming, Sachin Sharma, Jiho Noh",
  booktitle = "{2025 IEEE 19th International Conference on Semantic Computing (ICSC)}",
  year      =  2025
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
img		img
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouLeQD: YouTube Learners' Questions on Bloom's Taxonomy Dataset

Introduction

Data

Source

Datasets

Methodology

Question Extraction

Bloom Taxonomy Levels of Questions extracted from Youtube Educational Video Comments

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

YesNLP/QYTL

Folders and files

Latest commit

History

Repository files navigation

YouLeQD: YouTube Learners' Questions on Bloom's Taxonomy Dataset

Introduction

Data

Source

Datasets

Methodology

Question Extraction

Bloom Taxonomy Levels of Questions extracted from Youtube Educational Video Comments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages