Skip to content

screddy1313/Language-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Language-modelling

In this project we will generate the sentences using ngrams

Dataset

We will be using 20 newsgroup dataset which is standard dataset for text related tasks.

Code

  • In this project we will do the following tasks:

    • train the unigram, bigram, trigram model using all files of rec.sport.baseball and rec.motorcycle
    • given a sentence find the log probabilty of the sentence for above models
    • given a sentence find the perplexity of the sentence for different above models
    • given a sentence find the log probabilty using good turing smoothing for different models
  • Code is self documented in python notebook

About

In this project we will generate the sentences using ngrams

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published