Statistical Analysis of Hindi and Sanskrit Languages

Authors

[Rudra Gohel] [Dev Kabra] [Kirtan Shah]

Statistical Analysis of Hindi and Sanskrit Languages

Overview

This repository contains the code and datasets for a research project that focuses on a comprehensive statistical analysis of the Hindi and Sanskrit languages. The study aims to provide valuable insights into the linguistic structures of these languages and explore their relationship with culture and society. The findings have practical applications in fields such as cryptanalysis, machine translation, natural language processing, and sentiment analysis.

Research Highlights

Dataset Selection: Meticulous selection and evaluation of datasets for both Hindi and Sanskrit languages.
Linguistic Aspects Explored:
- Frequency Analysis
- Character Grouping
- Digrams and Trigrams
- Average Word Length
- Zipf’s Law
- Word Entropy
- N-gram Entropy
Encouraging Results:
- Distinct patterns in character occurrences
- Structural complexities
- Adherence to Zipf’s Law in both languages
- Balanced mix of structured and variable word usage based on Word Entropy analysis
Comparisons with English:
- N-gram Entropy comparisons with English for insights into symbol relationships.

Repository Structure

The repository contains two main folders - hindi and sanskrit. Each folder contains analysis.ipynb to generate the results. Results are in the form of CSV files and images.

Usage

To reproduce the results of the research, follow these steps:

Clone the repository:

git clone https://github.com/guptalab/hindisanskritstat.git

Install the required packages:
```
pip install -r requirements.txt
```
Run the 'analysis.ipynb' file in both folders to generate the results.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Hindi		Hindi
Sanskrit		Sanskrit
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Authors

[Rudra Gohel] [Dev Kabra] [Kirtan Shah]

Statistical Analysis of Hindi and Sanskrit Languages

Overview

Research Highlights

Repository Structure

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

guptalab/hindisanskritstat

Folders and files

Latest commit

History

Repository files navigation

Authors

[Rudra Gohel] [Dev Kabra] [Kirtan Shah]

Statistical Analysis of Hindi and Sanskrit Languages

Overview

Research Highlights

Repository Structure

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages