Skip to content

LaclauGPT scripts for multimodal analysis of EP2024 social media videos. Meant to be run as batch jobs on CSC Puhti supercomputer. For research documentation only.

License

Notifications You must be signed in to change notification settings

TomiToivio/LaclauGPT-Multimodal-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LaclauGPT

LaclauGPT is a political science multimodal data collection and analysis pipeline. It is called LaclauGPT as a tribute to Ernesto Laclau.

LaclauGPT is developed by Tomi Toivio for three Helsinki Hub on Emotions, Populism and Polarisation research projects funded by the European Union:

  • CO3 researches the social contract.
  • ENDURE researches the world after the pandemic.
  • PLEDGE researches grievance politics.

The pipeline was used to collect and analyze multimodal social media data related to the 2024 European parliament elections. Data was collected from TikTok and Instagram. Data collection started in 1st of May 2024 and continued until the election day in 9th of June 2024. Collection was based on usernames of official election candidates as well as hashtags and search queries related to the elections. Election data was collected for Bulgaria, Croatia, Finland, France, Germany, Hungary, Portugal, Spain and Sweden. Collected and analyzed data cannot be released yet due to GDPR. This open source version uses dummy data.

LaclauGPT Multimodal Data Analysis

These data analysis scripts are published for research documentation. You cannot use these without modification.

These are used with Ollama running on CSC Puhti supercomputer.

The scripts are submitted as batch jobs in a sequence:

  1. puhti_preprocess.py - This extracts video frames with OpenCV, processes the with EasyOCR and extracts a Whisper transcript of the audio.

  2. puhti_frame.py - This uses Llama to create a multimodal analysis of 1-6 extracted frames.

  3. puhti_summary.py - This creates a Llama summary analysis based on the metadata, Whisper transcript and Llama multimodal analysis results.

Code for the TikTok Scraper used to collect EP2024 data is also available.

About

LaclauGPT scripts for multimodal analysis of EP2024 social media videos. Meant to be run as batch jobs on CSC Puhti supercomputer. For research documentation only.

Topics

Resources

License

Stars

Watchers

Forks

Languages