American Indian Language Dictionaries in Digital Archives: Adaptive Learning Models for Archival Description

Overview

This repository explores the methodologies and frameworks used in creating American Indian language dictionaries for digital archives. It presents an improved model for automated archival processing, comparing adaptive learning models with traditional human processing to enhance archival descriptions. The broader implications for large-scale digital projects and the future of archival science are also discussed.

With advancements in Natural Language Processing (NLP) and machine learning, the ability to process large volumes of typewritten and handwritten text within minutes—rather than days or months—represents a transformative leap in digital archival science. By refining text analysis, entity recognition, and terminology control, this model accelerates metadata standardization, making historical records more accessible and meaningful.

Click for the American Indian Language Working List and Resources

Objectives

This project aims to:

Develop and improve dictionaries for American Indian languages within archival frameworks.
Implement adaptive learning models to process and interpret archival descriptions.
Standardize terminology across digital archives while preserving unique tribal distinctions.
Enhance entity recognition, linking historical events, policies, and individuals in American Indian history.
Improve metadata accuracy through feedback loops that refine machine learning predictions over time.

The Role of Adaptive Learning in Archival Science

Traditional archival description and metadata creation rely heavily on manual human input, which can be inconsistent, biased, and time-consuming. Adaptive learning models enhance this process by:

Accelerating Text Processing: Automating the recognition of entities, subjects, policies, and people.
Refining Controlled Vocabularies: Standardizing terminology (e.g., American Indian vs. Native American vs. Muscogee’).
Enhancing Metadata Linkage: Strengthening connections between records and ensuring semantic relationships between terms.
Detecting Patterns in Archival Texts: Identifying key themes, dates, and historical context through NLP techniques.

Challenges Addressed by the Model

Inconsistent terminology across archival collections.
Difficulty in recognizing handwritten or typewritten documents.
Inaccurate metadata due to human cognitive biases.
Challenges in identifying ceremonial, legal, or political references in tribal history.

Solution: Our model improves accuracy by integrating Named Entity Recognition (NER), sentiment analysis, and entity-linking techniques to detect, categorize, and interconnect important archival information.

Key Technologies & Methodologies

1. Controlled Vocabularies and Language Libraries

Standardized dictionaries ensure consistent data annotation.
Language models are trained to recognize, classify, and translate diverse terms.
Example: ‘Mvskoke’ vs. ‘Muscogee Creek’—ensuring all related documents are linked under the same classification.

2. Natural Language Processing (NLP) Techniques

Named Entity Recognition (NER): Identifies names, organizations, locations, and historical terms.
Topic Modeling: Groups related documents by themes (e.g., treaties, sovereignty, land policies).
Text Classification: Maps documents to predefined controlled vocabularies for better searchability.
Sentiment Analysis: Detects contextual tone (e.g., legal proceedings vs. personal correspondence).
Entity Linking: Connects references within texts to historical events, people, and organizations.

3. Feedback-Loop Controls

Ensures continuous model improvement through automated re-training.
Allows human reviewers to refine machine predictions.
Differentiates between literal and figurative language (e.g., ‘Chief’ as a tribal leader vs. a government title).

Example: If a satirical remark appears in congressional records, the system flags it for review to prevent misinterpretation.

Metadata Standardization & Data Linkage

To ensure archival consistency, this project creates standardized metadata feedback loops:

Figure [2]: Metadata Feedback Loops – Terminology, Language

Process Flow:

Extract terms from historical documents.
Compare against controlled vocabularies.
Identify relationships across archival records.
Normalize metadata while preserving original text.
Validate accuracy using human feedback.

Improved Record Linkage:

By unifying terminology, the system prevents fragmentation and strengthens archival metadata. For example:
- Muscogee Creek → Mvskoke (Linked under a standardized term).
- Indian Affairs Act (1978) → Referenced in multiple congressional records.

Why It Matters for American Indian Language Preservation

Many Indigenous languages remain underrepresented in digital archives.
Adaptive models can preserve, translate, and classify historical linguistic data.
Recognizing tribal sovereignty through accurate archival description is vital for historical justice.
Automated archival processing ensures faster access to cultural records for Indigenous communities, researchers, and educators.

Future Directions

Expanding NLP models to cover additional tribal languages and dialects.
Integrating AI-powered handwriting recognition to process handwritten American Indian documents.
Building interactive digital dictionaries for Indigenous language preservation.
Collaborating with Indigenous scholars and communities to refine data representation.

How to Use This Repository

Data & Scripts: Contains Python scripts, JSON mappings, and NLP models for language processing.
Metadata Guidelines: Best practices for integrating standardized American Indian terminology in digital archives.
Research Papers: Publications on adaptive learning in archival processing.

Acknowledgments

This project is part of ongoing NEH and NHPRC-funded research dedicated to American Indian sovereignty, policymaking, and historical documentation. We acknowledge the contributions of tribal historians, language experts, and archivists working towards equitable representation of Indigenous knowledge in digital archives.

Relevant Projects & Grants:

Contact & Collaboration

Email: japryse@ou.edu
GitHub: https://prys0000.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Cherokee		Cherokee
Chickasaw		Chickasaw
Choctaw		Choctaw
Controls_Adaptive_001.jpg		Controls_Adaptive_001.jpg
Languages_American_Indian.csv		Languages_American_Indian.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

American Indian Language Dictionaries in Digital Archives: Adaptive Learning Models for Archival Description

Overview

Objectives

The Role of Adaptive Learning in Archival Science

Challenges Addressed by the Model

Key Technologies & Methodologies

1. Controlled Vocabularies and Language Libraries

2. Natural Language Processing (NLP) Techniques

3. Feedback-Loop Controls

Metadata Standardization & Data Linkage

Why It Matters for American Indian Language Preservation

Future Directions

How to Use This Repository

Acknowledgments

Contact & Collaboration

About

Uh oh!

Releases

Packages

Languages

prys0000/adaptive_languages_archives

Folders and files

Latest commit

History

Repository files navigation

American Indian Language Dictionaries in Digital Archives: Adaptive Learning Models for Archival Description

Overview

Objectives

The Role of Adaptive Learning in Archival Science

Challenges Addressed by the Model

Key Technologies & Methodologies

1. Controlled Vocabularies and Language Libraries

2. Natural Language Processing (NLP) Techniques

3. Feedback-Loop Controls

Metadata Standardization & Data Linkage

Why It Matters for American Indian Language Preservation

Future Directions

How to Use This Repository

Acknowledgments

Contact & Collaboration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages