The SVLM Hebrew Wikipedia Courpus is a corpus made up of 50,000 Hebrew sentences from the Hebrew Wikipedia chosen to ensure phoneme coverage for the purpose of a sentence recording project.
The corpus was built by Dr. Vered Silber-Varod and Prof. Ami Moyal as part of their work on [Varod17].
Corpus: https://github.com/NLPH/SVLM-Hebrew-Wikipedia-Corpus/blob/master/SVLM_Hebrew_Wikipedia_Corpus.txt
As it was generated from Hebrew Wikipedia sources, which are licensed under the CC-BY-SA 3.0 license, this corpus is thus also necessarilly licensed under the same license.
[Varod17] | Silber-Varod, V., Latin, M., & Moyal, A. (2017) "Frequency of Hebrew phonemes and phoneme clusters in a data-driven approach. (in Hebrew). Literacy and Language (Oryanut Ve-Safa), 6, 22-36 [pdf] |