README - Statistical and Temporal Analysis of French-Language Tweets on the Russia-Ukraine War

Project Description

The goal of this work is to conduct an in-depth statistical analysis of French-language tweets published between February 24, 2023 and March 4, 2023, concerning the Russia-Ukraine war. The study will initially focus on exploring the quantitative variables present in the dataset, with particular attention to identifying possible relationships between them. The objective is to develop statistical models capable of describing and explaining the interactions among the variables, analyzing how they influence each other. Subsequently, the focus will shift to the temporal analysis of the data. Not only will the trend of tweets over time be described, but statistical models will also be constructed to represent the time series, evaluating the potential of these models to make predictions (forecasting) based on the data. The developed models will then be tested and compared to identify the most effective one. Finally, the last part of the project will explore the use of large language models (LLMs) to generate synthetic datasets. The aim is to verify whether such models can produce twin datasets that maintain the same structure and information as the original ones, but with a reduced number of observations. This is to assess whether their use could be helpful in drawing conclusions without processing a massive amount of data.

Project Structure

The project is divided into the following sections:

Introduction: Overview of the analysis and objectives.
Dataset Analysis:
- Description of variables.
- Data cleaning and outlier treatment.
- Analysis of distributions and data transformations.
Relationships Between Variables:
- Calculation of correlations.
- Linear regression for predicting followers and scores.
- Multinomial regression for sentiment prediction.
Time Series Analysis:
- Identification of seasonal patterns and tweet trends.
- Time series decomposition.
Forecasting with Machine Learning Models:
- ARIMA models for forecasting scores and number of tweets.
- Auto-Regressive Neural Network for predictive analysis.
- Model comparison and performance evaluation.
Synthetic Dataset Generation:
- Use of language models to create alternative datasets.
- Evaluation of the reliability of synthetic data compared to real data.

Technical Requirements

To run the project code, the following tools are required:

Programming Language: R
Main Libraries:
- ggplot2, GGally for data visualization.
- forecast, tseries for time series analysis.
- nnet, caret for neural networks and regressions.
- dplyr, tidyverse for data manipulation.

Execution Instructions

Download the dataset: The dataset is available in the repository.

Install the required libraries:

install.packages(c("ggplot2", "GGally", "forecast", "tseries", "nnet", "caret", "dplyr", "tidyverse"))

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.gitignore		.gitignore
LLMs.Rmd		LLMs.Rmd
PROMPT-SAD.txt		PROMPT-SAD.txt
Progetto_SAD.pdf		Progetto_SAD.pdf
README.md		README.md
SAD.Rproj		SAD.Rproj
Sentiment_fr_tweet_2023.csv		Sentiment_fr_tweet_2023.csv
descrizione-dataset.Rmd		descrizione-dataset.Rmd
ts_analysis.Rmd		ts_analysis.Rmd
ts_analysis2.Rmd		ts_analysis2.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README - Statistical and Temporal Analysis of French-Language Tweets on the Russia-Ukraine War

Project Description

Project Structure

Technical Requirements

Execution Instructions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

rosariopiognazzo/TimeSeries-Tweet-Forecast-R

Folders and files

Latest commit

History

Repository files navigation

README - Statistical and Temporal Analysis of French-Language Tweets on the Russia-Ukraine War

Project Description

Project Structure

Technical Requirements

Execution Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages