We scraped Fox News transcripts from here. In all, we scraped around ~24k transcripts.
I scraped the data again in 2025, and the breakdown is as follows:
year | count |
---|---|
2003 | 450 |
2004 | 365 |
2005 | 431 |
2006 | 411 |
2007 | 304 |
2008 | 418 |
2009 | 425 |
2010 | 314 |
2011 | 523 |
2012 | 1019 |
2013 | 777 |
2014 | 866 |
2015 | 890 |
2016 | 821 |
2017 | 1259 |
2018 | 1752 |
2019 | 5865 |
2020 | 5995 |
2021 | 5400 |
2022 | 6782 |
2023 | 9585 |
2024 | 8256 |
2025 | 1474 |
The final dataset, including the HTML files, is posted on a Harvard Dataverse
- notnews/msnbc_transcripts β MSNBC Transcripts: 2003--2022
- notnews/cnn_transcripts β CNN Transcripts 2000--2025
- notnews/stanford_tv_news β Stanford Cable TV News Dataset
- notnews/nbc_transcripts β NBC transcripts 2011--2014
- notnews/archive_news_cc β Closed Caption Transcripts of News Videos from archive.org 2014--2023