Spoken Israeli Hebrew Corpus
The Spoken Israeli Hebrew Corpus contains the transcripts of spoken Hebrew conversations and parts of The Corpus of Spoken Israeli Hebrew (CoSIH). The speakers' names and identifying details within have been removed or changed to ensure their anonymity. The original text was provided by Tel Aviv University's Shlomo Izre'el and Esti Borochovsky Bar Aba from the CoSIH project.
-
Plain Text
Use cp1255 encoding to properly view the files.
-
Tokenized Text in XML
The XML schema follows MILA's corpus standards.
-
Morphologically Disambiguated Text in XML
Tokenized text tagged with all possible morphological analyses.
Inappropriate analyses (for the sentence context) are given a score of 0, and appropriate analyses are given a positive score.
The XML schema follows MILA's corpus standards.
View all corpora...
View corpus standards...