Document-level machine transliteration

Setup the environment (bash command-line)

git clone https://github.com/princetongenclearizalab/pgp_transliteration.git
cd pgp_transliteration
pyenv virtualenv 3.8 pgp_transliteration
pyenv activate pgp_transliteration
pip install -r global_def/requirements.txt
huggingface-cli login --token <[hugging_face_token](https://huggingface.co/docs/hub/en/security-tokens)>
PYTHONPATH="<local_path_to_cloned_repo>/pgp_transliteration:$PYTHONPATH"
export PYTHONPATH

Prepare the input

Prepare a list of Judaeo-Arabic strings associated with IDs

from pg_prep.prep_pg_data import content_by_pgps
ids_texts = content_by_pgps([4268, 444])

Break-down long documents into smaller groups of interleaving text sequences.

from pg_prep.sliding_window import slice
sliced = slice(contents=[ids_texts[0][1], ids_texts[1][1]],
                pgpids = [ids_texts[0][0], ids_texts[1][0]],
                target_window = 300,
                ctxt_window = 100)

Invoke the Bert-based model

from run.e2e_pipe import PipelineManager
output_format = "by_docx_path"
pm = PipelineManager(sliced, output_format=output_format, stich_back=True)

And present the result

present_output(output_format, pm)

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
global_def		global_def
pg_prep		pg_prep
pre_train		pre_train
resources		resources
run		run
train		train
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document-level machine transliteration

Setup the environment (bash command-line)

Prepare the input

Invoke the Bert-based model

And present the result

About

Uh oh!

Releases

Packages

Languages

License

princetongenizalab/pgp_transliteration

Folders and files

Latest commit

History

Repository files navigation

Document-level machine transliteration

Setup the environment (bash command-line)

Prepare the input

Invoke the Bert-based model

And present the result

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages