Testing Our Transliteration Engine with help from James Strong’s Biblical Hebrew Dictionary

Contributor(s):

Shared on:

28 April 2010 under the Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 International copyleft license

Categories:

Tags:

The mark of a particularly valuable dictionary is how long it is still being used years after it’s introduced. Marcus Jastrow’s Dictionary of the Targumim, Talmud Babli, Talmud Yerushalmi and Midrashic Literature (1903), Brown-Driver-Brigg’s Hebrew and English Lexicon of the Old Testament (1906), and James Strong’s Concise Dictionary of the Words in the Hebrew Bible with their Renderings (1890) are all standard reference works still used today.

But dictionaries are not only invaluable reference tools for scholarly research. They are also useful features in online applications–and not only for their definitions. Dictionaries also include transliterations. Both translations and transliterations are features we would like to provide for users of the Open Siddur application we are developing. But in order to provide these features, the dictionaries must be digitized and their contents encoded in a standard searchable format.

Strong’s dictionary, prepared as a companion to his famous concordance, contains a complete list of Hebrew words that appear in the TaNaKh, transliterated with a consistent ruleset. By formatting the words in the dictionary and replicating the transliteration as it appears in the dictionary the Open Siddur Project could test the transliteration engine that will be used to transliterate Hebrew text with nikkud (vowels) to any other script (Latin, Cyrillic, Arabic, Amharic, etc.).

Strong’s dictionary was digitized (PDF) in 1998 by unknown contributors. Converting the digitized, machine-readable text to an open standard format was a milestone sought by a number of projects including OpenScriptures, an open source project digitizing and encoding variant manuscripts of the Gospels. In partnership with Open Scripture contributing developers, the Open Siddur Project created a quality XML encoding of Strong’s dictionary. Work began early in February and was completed by the second week of April. The data and XML is available as public domain text, here.

The project served to test our transliteration engine and develop a good working and ecumenical relationship between two worthy open source projects sharing technology for advancing the digital humanities. The three main contributors on the project were David Troidl, Ze’ev Clementson, and Efraim Feinstein (lead developer of the Open Siddur). Ze’ev initiated the project in a forum discussion, here. (Ze’ev is a regular contributor to the Open Siddur Project and creator of innovative Jewish educational software for the iPhone/iPad platforms (so far)). Initially, David obtained Strong’s Hebrew data in the form of a PHP script from Dr. David Instone-Brewer. (The data is the basis for Instone-Brewer’s website 2 Letter Lookup). Troidl then parsed out the data and converted it to the Open Scripture Information Standard (OSIS) XML schema, “using the best available OSIS structure for the data, since OSIS has no official dictionary module,” he explained. ~~Efraim~~ Ze’ev converted the XML to the Text Encoding Initiative (TEI) standard used by the Open Siddur Project. Efraim helped define the tag usage for the Open Siddur extension to the TEI — the JLPTEI.

James Strong is best known for his concordance and the scholarly tools he innovated show a prescient interest in linked data. Were Strong to look down from his perch in the heavenly yeshiva/academy at our work, I think he might be quite pleased with this collaboration. I asked David, Ze’ev and Efraim if they would comment on their work together contributing to the Open Siddur Project.

Why was updating Strong’s Hebrew Dictionary to Unicode and XML such an important target?

David: The existing ASCII transliterations were neither fully accurate, nor a faithful representation of the ones in Strong’s printed dictionary.
Ze’ev: Having a standard “dictionary” in Open Siddur allows us to provide new functionality that we didn’t have before. So, for example, someone might want to make a child’s siddur or a siddur for people who don’t know understand or read Hebrew well that contains definitions (perhaps at the bottom of the page or in the margin) for some of the less common Hebrew words. We now have the ability to add accurate transliterations (either alongside the Hebrew or instead of the Hebrew in the siddur) for people who can’t read Hebrew.
Efraim: It opened a route of collaboration between projects. Specific to the Open Siddur, working on this provided a specification for a new type of associated data — dictionary words.

How did this collaboration advance your project’s specific goals?

Efraim: By having a second independent implementation of transliteration we could debug our transliterator and discover corrections to make in Strong’s original document.
Ze’ev: The collaboration with David was great in that it helped us to identify a number of shortfalls/bugs in the existing Open Siddur transliteration code. However, the collaboration worked both ways in that, as a result of the email discussions, David was able to make changes to his source document (e.g. – adding qamets qatan, holam haser for vav, spelling corrections, etc) and Open Siddur was able to “robustify” the transliteration code so that it handled more “corner cases” than it did before (e.g. – adding dagesh transliteration support for non-bgdkft letters, improved handling of silent letters, support for literally-transliterated tetragrammaton, etc) and it also helped us to generally test/verify other transliterations. We also now have a “transliteration tester” which lets us automatically validate our programmatically-generated transliterations against static versions of the Strong’s transliterations (this can be used in unit tests in the future to ensure that transliteration code changes don’t “break” the existing Strong’s transliteration logic).
David: Ze’ev’s questions about Strong’s transliterations, while developing his transliteration table, spurred me on to work on a project that I had been wanting to do for some time anyway. After using the existing ASCII transliterations as a comparison metric, we started trading our results, to compare the Unicode transliterations. This allowed both of us to fine tune our code and produced a significantly higher degree of accuracy in the transliterations themselves.

The XML and code generated by this effort are available to everyone with open/free culture licensing. Besides this obvious advantage, how will this work help other folk’s projects worldwide?

David: Many Internet resources that I’ve come across fall far short of a print publisher’s tolerance for error. While the availability of the information is commendable, I would like to see a greater emphasis on the accuracy of existing resources. There are many texts, in print and electronic, that utilize Strong’s numbering system, and a reliable reference will be a benefit.
Ze’ev: The major additional “public” benefit from the exercise is that there is now an open-source, digital Strong’s source document that has accurate transliterations in the style defined in the Strong’s Concordance book and which has been edited to correct misspellings (both Hebrew misspellings and errors in the English transliterations). Already, at least one application (my Hebrew Bible iPhone/iPad app) is utilizing the Strong’s Concordance data and currently provides integrated word definition display when reading the TaNaKh as well as Hebrew root lookup.

What new opportunities for collaboration and development do you see coming out of this work?

Efraim: One thing we (Open Siddur) don’t have is a way for the general public to get access to it! It’s in SVN, and we have a way to convert it to HTML, but no public interface [exists yet] to transform it or to look up words.
David: The ideal markup format is still an open question. My version is pushing the limits of OSIS, in its present form. The TEI form that Ze’ev [Open Siddur] is using is verbose and repetitive. I have some ideas that are still coalescing. Also, we are moving toward adding a Brown, Driver, Briggs layer on top of the Strong’s data, striving for a richer and more accurate dictionary.
Ze’ev: This was a bit of a unique scenario in that we were both attempting to validate independently-developed transliteration code with the same source document. We have now completed the validation of the Strong’s transliteration scheme and have discussed following through with a similar exercise for the SBL transliterations.

. Creative Commons Attribution-ShareAlike . 4.0 . International .

“Testing Our Transliteration Engine with help from James Strong’s Biblical Hebrew Dictionary” is shared through the Open Siddur Project with a Creative Commons Attribution-ShareAlike 4.0 International copyleft license.

the Hierophant

A hierophant is a person who invites participants in a sacred exercise into the presence of that which is deemed holy. The title, hierophant, originated in Ancient Greece and combines the words φαίνω (phainein, "to show") and ‏τα ειρα (ta hiera, "the holy"); hierophants served as interpreters of sacred mysteries and arcane principles. For the Open Siddur Project, the Hierophant welcomes new contributors and explains our mission: ensuring creatively inspired work intended for communal use is shared freely for creative reuse and redistribution.

Stable Link: https://opensiddur.org/?p=503

Associated Image:

(This image is set to automatically show as the "featured image" in shared links on social media.)

Source Data: XML | JSON

Re-formatted: HTML | ODT

Terms of Use: Be a mentsch (a conscientious, considerate person) and adhere to the following guidelines:

Properly attribute the work to the Hierophant.
Clearly indicate the date you accessed the work and in what ways, if any, you modified it. (If you have adapted the work, let us know so that the contributor might consider endorsing your revision.)
Provide the stable link to this resource: <https://opensiddur.org/?p=503>.
Indicate that the original work was shared under the Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 International copyleft license. (To redistribute or remix this work in any format, modified or unmodified, you must refer to the terms of the license under which the work is shared.)

Additional Notes:

The views expressed in this work represent the views of their creator(s) and do not necessarily represent the views of the Open Siddur Project's developers, its diverse community of volunteer contributors, or its institutional partners.
We strongly advise against printing sacred texts and art containing divine names as these copies must be regarded with reverence, complicating their casual treatment and disposal.
If you must dispose of a printed sacred text (one containing Divine Names), please locate the closest genizah (often established by a synagogue) and contact its custodians for further instructions. We also recommend using Morah Yehudis Fishman's Prayer for Adding a Work to the Genizah.

Support this work: The Open Siddur Project is a volunteer-driven, non-profit, non-commercial, non-denominational, non-prescriptive, gratis & libre Open Access archive of contemplative praxes, liturgical readings, and Jewish prayer literature (historic and contemporary, familiar and obscure) composed in every era, region, and language Jews have ever prayed. Our goal is to provide a platform for sharing open-source resources, tools, and content for individuals and communities crafting their own prayerbook (siddur). Through this we hope to empower personal autonomy, preserve customs, and foster creativity in religious culture.

ויהי נעם אדני אלהינו עלינו ומעשה ידינו כוננה עלינו ומעשה ידינו כוננהו "May the pleasantness of אדֹני our elo’ah be upon us; may our handiwork be established for us — our handiwork, may it be established." –Psalms 90:17

Pocket Reddit Facebook Bluesky Telegram WhatsApp SMS Email

Testing Our Transliteration Engine with help from James Strong’s Biblical Hebrew Dictionary

Read a comment / Leave a comment (moderated)

Leave a Reply Cancel reply