Utilizing transformers on short sentence topic modeling

This repository contains scripts and tools for text analysis research, focusing on reflection datasets annotated by multiple individuals. The workflow includes dataset preparation, preprocessing, fine-tuning transformer-based models, and clustering using embeddings.

Dependencies

pandas openpyxl numpy torch nltk transformers scikit-learn seaborn matplotlib datasets

Codes

Dataset Preparation Scripts

prepare_dataset.py : Prepares the dataset using rare reflection data annotated by multiple people.

Utility Scripts

data_preprocessing.py : Contains preprocessing data for text and data preparation.
functions.py : Utility functions used across the fine-tuning and clustering workflows.

Main Scripts

finetune_model_bert.py : Trains a transformer-based model (BERT) on the reflection dataset to generate embeddings for downstream tasks.
clustering.py : Performs clustering using the embeddings generated by the transformer model.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
clustering		clustering
transformers		transformers
utility		utility
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Utilizing transformers on short sentence topic modeling

Dependencies

Codes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Eunyoungkim0/StudentReflectionAnalysis

Folders and files

Latest commit

History

Repository files navigation

Utilizing transformers on short sentence topic modeling

Dependencies

Codes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages