A repository documenting data from the cantfreq project at the SFU Language Production Lab.
The folders HKCAC
, HKCanCor
, and IARPA
document the scripts and results for each of the corpora analyzed in Li, Badrulhisham, & Alderete (2020). Each of these folders contain the script for word frequency generation, word frequency data, the script for sub-lexical data generation, and sub-lexical data. Note that we cannot provide the raw data for each of these corpora (which are required to run the word frequency generation script), and users seeking to replicate our results may contact the authors of these respective datasets.
The folders word
and sound
document the aggregated data from the three corpora as a means of comparison. Details can be read from the README documents in the respective folders.
Please contact Jane Li at jane_li@sfu.ca for any questions regarding the repository.