Transcribing Texts

A fundamental value of our project is correctly attributing Jewish liturgy and liturgy related work. Even when the original author of a work is lost to history, we strive to record every adaptation and variation sourced within particular manuscripts and extant published works in the Public Domain.

To do this, our volunteers help produce transcriptions that are easily determined to be authentic witnesses of a given work, whether it is the earliest known version known, or some other variation. We use Wikisource, as the collaborative transcription and proofreading environment for transcribing Public Domain and free-culture licensed texts. Volunteers have begun to transcribe and proofread these texts.

If you’d like to begin transcribing a work, let us know in the comments below! We can help you.

How to Get Started Transcribing!

Install a Hebrew Keyboard layout for your Operating System. (We recommend the Biblical Tiro layout.) Refer to the key mapping images and familiarize yourself with the four levels of the Biblical Tiro keyboard layout.
Download and install Unicode Hebrew Fonts supporting the full range of Hebrew diacritics. We recommend installing the Taamey Frank CLM font from the Culmus Project (available in the Open Siddur Font Pack).
Configure your web browser to display Unicode Hebrew Fonts supporting the full range of Hebrew Diacritics. See below for specific details for changing the default Hebrew fonts displayed in Mozilla Firefox.
Register a new user account with Wikisource.
Login and set your preferred settings for the language and editing interface in Wikisource (see below).

Preparing Mozilla Firefox for Transcription in Wikisource

Download the fonts in the Open Siddur Unicode Hebrew Font Pack and install them
Open Firefox Options.
Select the Content tab.
Click the Advanced button. Under Fonts, select Hebrew.
Choose your favorite Hebrew fonts and font size for transcription.
Click OK when finished.

Familiarizing Yourself with Wikisource

Login and Settings

If you haven’t yet registered an account on Wikisource, please create your account now.
To login to your account at Wikisource, click on the login link at the top right of the web page.
Click “My Preferences” in the top right corner. In Hebrew Wikisource, click “כניסה לחשבון” in the top left corner
Click “User Profile” to choose your preferred language for working within Wikisource’s interface. To navigate Hebrew Wikisource in English or another language, click “פרטי המשתמש” and the context menu next to “שפת הממשק” to select your preferred language.
Click “Editing” to set your preferred settings for using the editing interface. See below for my preferred settings to edit.

Using the Transcription Interface

The great things about collaborative transcription and proofreading is that you can correct other’s work and others can correct your errors. The key thing is to know how to navigate the Wikisource interface.

To edit a page of text, click on the ‘Edit’ link (next to the ‘Read’ link) above the page image.
When you are done editing or proofreading a page, don’t forget to indicate in what state you’ve left it. Reading the help page “Help:Editing” will help you better understand how Wikisource users track their transcription and proofreading.

What about OCR for Hebrew

Tesseract-OCR is an excellent open-source OCR for Hebrew. Combined with a user-interface such as VietOCR.net it is fairly easy to use.

With technology in its current state, manual transcription (typing) is the only reliable way to transcribe Hebrew text with vowels. Open source tools for the automated transcription of Hebrew are not capable of reliable conversion of images with Hebrew letters and diacritical marks into machine readable Hebrew text without requiring more work proofreading the text than would have been done transcribing it from scratch.[ref]HOCR is available for testing on Linux. Unfortunately, an effort to continue Kobi Zamir’s work on hOCR stalled in 2010. An early version of hOCR compiled for use on Windows is available for download here.[/ref] Until such tools improve, projects such as the Open Siddur must depend on the manual transcription of text by humans.

Getting Hebrew OCR with diacritical support available and accessible needs attention! While Tesseract can not OCR Hebrew with niqqud “out of the box,” researchers have had success in training Tesseract to do so. Take a look at Adi Oz and Vered Shani’s work at their project page and in their PowerPoint presentation. If Adi and Vered’s work could be made available to the wider community, we’d be grateful.

Another OCR project to keep an eye on is that of Assaf Urieli. What is interesting about Assaf’s approach is that it will check the OCR against a list of words so that the software can measure the confidence of its recognition.

The Open Siddur Project is a volunteer-driven, non-profit, non-denominational, non-prescriptive, gratis & libré Open Access archive of contemplative praxes, liturgical readings, and Jewish prayer literature (historic and contemporary, familiar and obscure) composed in every era, region, and language Jews have ever prayed. Our goal is to provide a platform for sharing open-source resources, tools, and content for individuals and communities crafting their own prayerbook (siddur). Through this we hope to empower personal autonomy, preserve customs, and foster creativity in religious culture. If you like what you've found here, please help keep our project alive and online with your financial contribution. ויהי נעם אדני אלהינו עלינו ומעשה ידינו כוננה עלינו ומעשה ידינו כוננהו "May the pleasantness of אדֹני our elo’ah be upon us; may our handiwork be established for us — our handiwork, may it be established."–Psalms 90:17

Download all posts and pages: ZIP (via github)

copyleft symbol Copyleft 2002-Present, Contributors to the Open Siddur Project. חלק מהזכויות שמורות | Some Rights Reserved.

All works published on opensiddur.org that are not yet in the Public Domain remain under the copyright of their respective creators and copyright stewards. Unless otherwise indicated, all creators and copyright stewards have graciously shared their work under a libre/free-culture compatible Open Content license until the term of their copyright expires and their work enters the Public Domain. The default license under which all content is shared on this site is the Creative Commons Attribution/ShareAlike (CC BY-SA) 4.0 International license. All fonts rendered through CSS @font-face are licensed with either an SIL-Open Font License (OFL) or a GNU Public License with a Font Exception clause (GPL+FE).

The Open Siddur is financially supported by recurring donations made through Patreon. Non-recurring tax-deductible donations may be made through Fundrazr, as supported by the 501(3)c registered non-profit organization acting as our fiscal sponsor: Jewish Creativity International.

The views expressed in contributed works represent the views of their creator(s) and do not necessarily represent the views of the Open Siddur Project's developers, its diverse community of contributors, patrons, or institutional partners.