Standards
The corpus and lexicon encoding standards below were developed by MILA and are used throughout its resources and tools, facilitating resource reuse and compatibility. MILA encourages other researchers and organizations to adopt these standards as well in their work.
Transliteration Scheme
Each Hebrew character is represented as a Latin character equivalent in the transliteration attribute of the XML tag:
א | ב | ג | ד | ה | ו | ז | ח | ט | י | כ | ל | מ | נ | ס | ע | פ | צ | ק | ר | ש | ת |
a | b | g | d | h | w | z | x | v | i | k | l | m | n | s | y | p | c | q | r | e | t |
(Note no distinction is made for Hebrew final-form letters.)
XML Schema for Corpora
The XML schema for the representation of morpho-syntactically annotated Hebrew corpora:
- hebrew_corpus_02_05_2011.xsd (May 2, 2011)
Previous versions:
- hebrew_corpus_11_08_2009.xsd (August 11, 2009)
- hebrew_corpus_22_07_2008.xsd (July 22, 2008)
- hebrew_corpus_28_03_2007.xsd (March 28, 2007)
- hebrew_corpus_15_01_2007.xsd (January 15, 2007)
- hebrew_corpus_07_11_2006.xsd (November 7, 2006)
- hebrew_corpus_25_01_2005.xsd (January 25, 2005)
XML Schema for Lexicons
The XML schema for the representation of morpho-syntactically annotated Hebrew lexicons:
- hebrew_lexicon_02_05_2011.xsd (May 2, 2011)
Previous versions:
- hebrew_lexicon_11_08_2009.xsd (August 8, 2009)
- hebrew_lexicon_22_07_2008.xsd (July 22, 2008)
- hebrew_lexicon_04_02_2007.xsd (February 4, 2007)
- hebrew_lexicon_18_01_2007.xsd (January 18, 2007)
- hebrew_lexicon_07_11_2006.xsd (November 7, 2006)
- hebrew_lexicon_02_03_2005.xsd (March 2, 2005)