Skip to content

A curated collection of hands-on NLP tasks that bridge theory with real-world application from tokenization to translation, sentiment analysis to spell correction. Dive into core concepts of computational linguistics using Python, NLTK, spaCy, and modern ML frameworks.

License

Notifications You must be signed in to change notification settings

kuldeep562/Natural_Language_Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧠 NLP Task Repository

This repository serves as a central hub for various Natural Language Processing (NLP) assignments, experiments, and projects. It includes practical tasks focused on core NLP techniques and tools using Python and popular libraries like NLTK and Scikit-learn.


📚 Included Work

🔹 Practical Assignment – I – 2024 (Branch: assignment-i-2024)

Task No. Topic Description
1 Text Preprocessing Tokenization, stopword removal, stemming, and lemmatization
2 POS Tagging Part-of-speech tagging using NLTK and evaluation with the Penn Treebank
3 Named Entity Recognition (NER) Entity detection using spaCy with CoNLL-2003 dataset
4 Ambiguity Analysis Lexical, syntactic, and semantic ambiguities using Brown Corpus
5 Sentiment Analysis ML-based sentiment model on IMDB movie reviews
6 Text Classification News article classification using 20 Newsgroups dataset
7 Language Modeling N-gram language model evaluated with WikiText-2
8 Machine Translation English-to-French translation using seq2seq model on WMT14
9 Text Generation RNN-based text generator trained on literary data from Project Gutenberg
10 Rule-Based Chatbot Simple chatbot with predefined rules and dialogue corpus

➡️ See branch: assignment-i-2024


🔹 Practical Assignment – II – 2024 (Branch: assignment-ii-2024)

Task No. Topic Description
1 Tokenization Sentence and word tokenizer using Reuters-21578 dataset
2 Stemming Porter Stemmer applied on Brown Corpus
3 Lemmatization WordNet lemmatizer with comparison to stemming using Gutenberg Corpus
4 Bag of Words (BoW) Convert documents into numerical vectors using 20 Newsgroups dataset
5 TF-IDF Feature extraction from IMDB Movie Reviews
6 Morphological Analysis Root form detection using Universal Dependencies
7 Regex Pattern Extraction Extract dates, emails, etc. from Enron Email Dataset
8 Levenshtein Edit Distance Compare word pairs using edit distance (WordNet or custom dataset)
9 Preprocessing Pipeline Includes tokenization, normalization, and vectorization (Amazon Reviews)
10 Spell Checker Suggest spelling corrections using edit distance and Birkbeck corpus

➡️ See branch: assignment-ii-2024


🔹 Learning Task Folder (Recent Addition)

A new folder titled Learning Task has been added to the repository. It currently includes:

  • 📝 Natural Language Preprocessing.ipynb – A notebook demonstrating core text preprocessing techniques
  • 🧪 Small Task.ipynb – A mini NLP task or experiment (details inside notebook)

This section will grow as more ad-hoc or exploratory tasks are added.


🚀 Getting Started

Clone the Repo

git clone https://github.com/yourusername/nlp-task.git
cd nlp-task

View Specific Work

Switch to the relevant branch:

git checkout assignment-i-2024
# or
git checkout assignment-ii-2024

🛠 Tech Stack

  • Python 3.8+
  • NLTK
  • spaCy
  • Scikit-learn
  • Pandas & NumPy
  • TensorFlow / PyTorch (as required)
  • Hugging Face Transformers (optional)

📦 Dataset Sources


🙌 Acknowledgements

Datasets and tools used from:

  • NLTK
  • Stanford AI
  • UCI ML Repository
  • Hugging Face Datasets
  • Kaggle
  • Universal Dependencies

About

A curated collection of hands-on NLP tasks that bridge theory with real-world application from tokenization to translation, sentiment analysis to spell correction. Dive into core concepts of computational linguistics using Python, NLTK, spaCy, and modern ML frameworks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published