Skip to content

lammypi/MSCAPP_Seminar_Lessons

Repository files navigation

MSCAPP Seminar Lessons

On 13 Feb 2024, I gave a lunchtime talk on creating an analysis using basic text analytics and stylometrics. The ISOT Fake News Data Set was used as the basis for demonstrating the following activities:

  • Setting up a text processing pipeline in spaCy.
  • Calculating counts and proportions of text features.
  • Calculating readability metrics, vocabulary richness, and lexical diversity via common python packages:
    • Automated Readability Index (ARI) via textstat.
    • Type-Token Ratio and Measure of Textual Lexical Diversity via lexicalrichness.
  • Determining emotion and valence of texts using LeXmo.
  • Reviewed 3 potential projects using text analytics and stylometrics:
    • Clustering
    • Topic Modeling with BERTopic
    • Predictive Modeling

About

Files related to the MSCAPP seminar on text analytics and stylometrics on 13-Feb-2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published