📖 Documentation for project: Detection of fake Ukrainian news

📌 Description:

This is a student project for the Machine Learning course, aimed at developing an application for detecting fake news.

🎯 The main goal:

Create a machine learning model capable of analyzing Ukrainian news texts and determining their authenticity.

✅Approach

1️⃣ Data Collection

Searching for available Ukrainian news datasets in internet
Downloading datasets (if we find it)
Analyze the data structure and identify key fields (URL, title, text, source, date). What sources the news is from, etc.

2️⃣ Data Preprocessing

Extracting relevant information for further work. For example:
- Title of news
- Text of news
- Time when the news was published
- Additional info that will be needed later
Removing duplicates and unnecessary data
Analyzing data distribution and identifying news sources. We divide them into the following
- Sources we trust:
- Sources we do not trust:
  - Вокс Україна
  - Война с фейками
Balancing classes (if necessary)

3️⃣ Model Development

Selection of a machine learning algorithm by the selection method.
Training of the selected algorithm
Determining its performance.
Optimization of hyperparameters to increase the accuracy.
(Optional) If the approach fails, consider:
- Using an LLM model.
- Falling back to GPT-based classification.

4️⃣ Creating a WebUI for easier interaction

Using streamlit for this purpose.

📊 Datasets Used

1️⃣ Ukrainian Fake and True News

Source: Kaggle
Description: Contains Fake and True news about Russo-Ukrainian war

2️⃣ Ukrainian News

Source: Hugging Face
Description: A dataset of news articles downloaded from various Ukrainian websites and Telegram channels

Divided the Dataset from Hugging Face
- Source:
  - Filemail. A modified version of the Hugging Face dataset, divided into 22 parts, each containing 1 million rows.
  - Filemail. A modified version of the Hugging Face dataset, divided into 46 parts, each containing 500 thousand rows.
- Using this code for that purpose.

🚀 Development Journey

⭐ 2025-02-03 – Team Formation.

👥 Team Members:

PM: voinskyi
Data engineer: Yul4onok
Data scientist: highbrow-228

💡2025-02-06 – Development of the Idea to Tackle Disinformation.

🔍2025-02-14 – Searched for datasets with Ukrainian news, but failed.

📰2025-02-16 – The first try to scrap news from TSN.ua

News is scrapping very slowly so we decided to find other ways of collecting data.

🤔2025-02-17

Came across the problem of labeling data (how to label a huge amount of data (fake or real) by ourselves?).
The second try to find the dataset and already successful. (We found ukrainian-news on HuggingFace and Ukrainian News on Kaggle).
We consider the idea of labeling data this way: Choose sources we trust and mark that news as "real", then choose some suspicious sources and mark them as "fake".

🗂️2025-02-18

Trying to download ukrainian-news dataset on HuggingFace but came across of lacking resources (the dataset is really large (22 milion rows) and takes a lot of RAM to process). So we decide to divide the dataset into 23 subsets for better processing.

📌 Next Steps... Stay Tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data		data
images		images
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Documentation for project: Detection of fake Ukrainian news

📌 Description:

🎯 The main goal:

✅Approach

1️⃣ Data Collection

2️⃣ Data Preprocessing

3️⃣ Model Development

4️⃣ Creating a WebUI for easier interaction

📊 Datasets Used

1️⃣ Ukrainian Fake and True News

2️⃣ Ukrainian News

🚀 Development Journey

⭐ 2025-02-03 – Team Formation.

👥 Team Members:

💡2025-02-06 – Development of the Idea to Tackle Disinformation.

🔍2025-02-14 – Searched for datasets with Ukrainian news, but failed.

📰2025-02-16 – The first try to scrap news from TSN.ua

🤔2025-02-17

🗂️2025-02-18

About

Uh oh!

Contributors 3

Uh oh!

Languages

License

highbrow-228/Project-ML

Folders and files

Latest commit

History

Repository files navigation

📖 Documentation for project: Detection of fake Ukrainian news

📌 Description:

🎯 The main goal:

✅Approach

1️⃣ Data Collection

2️⃣ Data Preprocessing

3️⃣ Model Development

4️⃣ Creating a WebUI for easier interaction

📊 Datasets Used

1️⃣ Ukrainian Fake and True News

2️⃣ Ukrainian News

🚀 Development Journey

⭐ 2025-02-03 – Team Formation.

👥 Team Members:

💡2025-02-06 – Development of the Idea to Tackle Disinformation.

🔍2025-02-14 – Searched for datasets with Ukrainian news, but failed.

📰2025-02-16 – The first try to scrap news from TSN.ua

🤔2025-02-17

🗂️2025-02-18

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages