This repository keeps track of the progress in language and speech technologies for low-resourced Middle Eastern languages. This does not include Arabic, Farsi, Turkish, and Hebrew.
- For Kurdish and its related varieties, check Awesome Kurdish.
- For Balochi, check Awesome Balochi
😞 How can I help? If you are native speaker of any marginalized language in the Middle East, you should feel disappointed about the lack of technological support of your language. While sad, this can easily change by collecting data. For translations, for example, try this: https://bouquet.metademolab.com.
- Parallel corpora: 5420 sentences in PARME, 1000 sentences in bitext-mining
- Speech corpus: DOLMA ASR
- Corpora: Wikipedia dumps
- 1000 words in Gilaki with English & Farsi translation
- Machine translation
- Language identification
- script normalization
- Gilaki keyboard layout for Linux
- Apertium Gilaki
- Parallel corpora: 4345 sentences in PARME
- Speech corpus: DOLMA ASR
- Corpora: Wikipedia dumps
- Parallel corpora: 2106 sentences in PARME
- Parallel corpora: 1998 sentences in PARME
- Parallel corpora:
- Corpora: Wikipedia dumps