Parse etymologies from Wiktionary

Write a function that does the following:

1. Take a language and a wordform (you may need to map from ISO code to language name; CLiCS might have a mapping table, otherwise use the one from Glottolog).
2. Visit the Wiktionary page for the wordform and go to the section for the appropriate language.
3. Go to the "Etymology" subsection (or if there are multiple etymology subsections, loop over them).
4. Extract the first sentence, of the form "From `<language name>` `<ancestral form>`".
5. Take the `<language name>` `<ancestral form>` pairing, and use those as input for another function call. (I.e. use [tail recursion](https://en.wikipedia.org/wiki/Tail_call).)
6. Keep chasing etymologies until you’ve hit a dead end. Along the way, save each etymological link to a table (with the fields "Language", "Wordform", "Source language", and "Source wordform").

Once we have a function that does the above, we can just loop it over all the words in NorthEuraLex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse etymologies from Wiktionary #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parse etymologies from Wiktionary #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions