Sembase

home ~ concept ~ Sembase history ~ design ~ application aiding design ~ semitic languages

a database project for the study of Semitic roots

This is an early sample of the results to date for the Sembase semantic category "to fill". At this point, data entry is virtually complete for Biblical Hebrew (Heb) and Geez (Gez). Most of Arabic has been entered, and almost half of Aramaic (Arm, in its many dialects), Mandaic Aramaic (Man), and post-Biblical Hebrew (pbh). Some material has been entered when found in comparative sources, such as Tigrinya (Tgn).

There are many alphabetization systems in dictionaries. Some are based on the Latin alphabet order (eg. the CAD). Others are based on the Hebrew alphabet order. Yet others are based on the Egyptian alphabet order. Those based on the order of a particular ancient language are awkward for users who are not fully accustomed to that order, such as scholars from another language field. Recognizing this, Hannig, in his Egyptian-German dictionary, places the alphabet, one line in transliteration, at the bottom of each right-hand page, for easy reference. None of these alphabets has a significant relationship to etymology. Words that might be related often wind up in very different parts of the dictionary because there has been a sound shift, such as an m-w or b-w correspondence. Sembase attempts to overcome this by using an alphabetization system that is based on proximity in the oral cavity. Of course, the oral cavity is not a linear space. So a methodology has been developed to arrive at this order empirically.
This methodology is based on the analysis of what Sembase calls minimal root pairs. This pair is defined as being a pair of roots that have only one consonant different, no metathesis, and identical or very similar meaning. This analysis too is not finalized, but at present, it is based on about 2,000 Arabic minimal root pairs. Arabic is useful, since it has next to the largest number of consonants (after OSA, which has all of those in Arabic, plus one), and has a very large number of extant roots. Of course Arabic does not have "p", just as Northwest Semitic lacks the "f" (not counting the b-g-d-k-p-t phenomenon)..
These roots are arranged in a table, where an aprioristic order is used as a starting point, so that the table has one column for each consonant, and one row for each consonant, in the same order, both beginning at the upper left-hand cell. Thus there is one cell for each possible consonant pair, in each order. The diagonal is the intersect of each consonant with itself, and so is not useful, and is thrown out. The cells below the diagonal each have a corresponding cell above the diagonal but in opposite order of the two consonants. Order is not relevant here, so the cells below the diagonal are thrown out. This leaves one cell for each possible pairing of consonants irrespective of order. A minimal root pair's most obvious trait is the consonants that are different. An example is sāla and sāra in Arabic, with the common semantic content of "flowing". This pair would be recorded in the cell in the table where "l" and "r" intersect.
Once a couple of thousand minimal pairs are similarly recorded in their corresponding cells, the analysis begins. If the entries in the table have no relationship, we would expect that they would be distributed randomly in the table. But if they are generally related, they will tend to cluster near the diagonal. Every possible ordering of the consonants would produce a different distribution. It is possible by inspection to identify a handful of orders (potential alphabetization sequences) with the largest number of roots near the diagonal. Each cell can be scored with respect to the degree of displacement from the diagonal. Cells with zero displacement have a score of zero, cells only one step from the diagonal have a score of 1, etc. The score of each cell is multiplied by the number of pairs in that cell, and all cell totals are added. The consonant ordering with the lowest score is the one that displays the strongest overall relationship. So goes the theory. The result does not have to be 100% accurate. The purpose is solely to establish an alphabetization order that would list roots that are more probably related to each other in relatively close proximity, to assist in observing them by inspection.
The entries in the table for the early results for "wife" follow the Latin-based alphabetization, while those for "to fill" are alphabetized using the early results from the minimal-root-pair analysis. Proximity in the oral cavityi provides an ordering rational that makes it highly intuitive and easy to use.