GitHub - franciellevargas/FactNews: FactNews is the first dataset to predict sentence-level factuality of news reporting. Furthemore, we provide baseline results for sentence-level factuality and media bias predicition in Portuguese. The FactNews is composed of 6,191 annotated sentences by factuality and media bias definitions by AllSides.

A Dataset for Sentence-Level Factuality and Media Bias Prediction in Portuguese

Automated fact-checking and news credibility verification at scale require accurate prediction of news factuality and media bias. Here, we introduce a large sentence-level dataset, FactNews, composed of 6,191 sentences expertly annotated according to the factuality and media bias definitions proposed by AllSides. We used FactNews to assess the overall reliability of news sources by formulating two text classification tasks: predicting the sentence-level factuality of news reporting and the bias of media outlets. Our experiments demonstrate that biased sentences tend to contain more words than factual sentences and exhibit a predominance of emotional content. This fine-grained analysis of subjectivity and impartiality in news articles showed promising results for predicting the reliability of entire media outlets. Finally, due to the severity of fake news and political polarization in Brazil and the lack of research in Portuguese, both the dataset and baselines were developed specifically for Portuguese.

The following image illustrates the annotation schema used to label FactNews::

The following table describes in detail the FactNews labels, documents, and stories:

Factual	Quotes	Biased	Total sentences	Total news stories	Total news documents
4,242	1,391	558	6,161	100	300

Media 1	Media 2	Media 3
Folha de São Paulo	Estadão	O Globo

Sentence-Level Media Bias Prediction	Sentenve-Level Factuality Prediction
67% (F1-Score) by Fine-tuned mBERT	88% (F1-Score) by Fine-tuned mBERT

CITING / BIBTEX

Please cite our paper if you use our dataset:

@inproceedings{vargas-etal-2023-predicting,
    title = "Predicting Sentence-Level Factuality of News and Bias of Media Outlets",
    author = "Vargas, Francielle  and
      Jaidka, Kokil  and
      Pardo, Thiago  and
      Benevenuto, Fabr{\'\i}cio",
    editor = "Mitkov, Ruslan  and
      Angelova, Galia",
    booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
    month = sep,
    year = "2023",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url = "https://aclanthology.org/2023.ranlp-1.127",
    pages = "1197--1206",
    }

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
annotators		annotators
dataset		dataset
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Dataset for Sentence-Level Factuality and Media Bias Prediction in Portuguese

CITING / BIBTEX

FUNDING

About

Uh oh!

Releases 3

Packages

franciellevargas/FactNews

Folders and files

Latest commit

History

Repository files navigation

A Dataset for Sentence-Level Factuality and Media Bias Prediction in Portuguese

CITING / BIBTEX

FUNDING

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Packages