Open Siddur Project Development Status as of 8/23/2009
This is our first development status post. Normally, this post will try to wrap up what we’ve achieved in the past week. Since this is our first, I’ll be summing up some of the progress we’ve made in the last month or so. It will serve as something like a newsletter, and will be posted on the discussion list and at opensiddur.org. Contact us if you want to include something we haven’t covered.
If you’d like to get news of Open Siddur as it happens, make sure to follow @opensiddur at Twitter.
Contributions (Aharon)
“I am/We are the original author(s) of _______ and I am/We are licensing the following attachments under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Attribution may be given as ‘Contributors to the Jewish Liturgy Project/Open Siddur’, with the author’s name(s) _______ included in the contributors list.”
Efraim reports that all the texts of the TaNaKh have been reformatted for use with our new schema.
Undigitized scanned texts: Aharon discovered some good scans of historical siddurim in the public domain at the Internet Archive, scanned and contributed by the University of Toronto, including the Seder Rav Amram hashalem (1922). Next step: deconstruct PDF into constituent image files according to the JLP/OS scanning guidelines (http://wiki.jewishliturgy.org/Transcription:Scanning_Guidelines).
Wikisource siddurim (קטגוריה:סידורי תפילה) that used to be licensed GFDL are now licensed CC-BY-SA, making their license compatible with Open Siddur. For those siddurim at wikisource that are truly in the Public Domain we need to scan the original sources and begin proofreading the transcription, e.g. סידור תורה אור. We need help finding a scan or Public Domain (pre-1923) print copy of Siddur Torah Ohr by R. Schneur Zalman of Liadi.
Scanning: Yonah and yitz_ (the latter, who we met on our 8/16 live chat) offered to investigate scanning protocols at UT. Ilan offered to do the same for Cornell. We need to review scanning guidelines.
Automated Transcription (via OCR): Aharon found an open source Hebrew OCR that recognizes nikkud and teamim. Efraim says if we can get 80% accuracy minimum we’re in business — we’d prefer 95% accuracy (1 mistake every 20 characters). We need testers. Also, if you are a coder interested in OCR, both Hocr and qHocr (a cross platform Qt4 port) would love your assistance.
Manual Transcription: 11 pages of Seder Avodat Yisroel transcribed and ready for proofreading, 470 more to complete (sans commentary).
Software Development
Backend (Efraim) — New XML database using eXist is running. Azriel using it for transcription interface.
Toolkit API: contributor list management API and bibliography managment API source code is now available. Some of this code is being rewritten to extract all eXist-specific features and syntax into separate XQuery files. Also, working on XML validation (RelaxNG, Schematron) for use during real-time updates. Rendering code requires a complete rewrite for the new JLPTEI encoding.
Documentation
Here’s a document on the wiki that is very useful: http://wiki.jewishliturgy.org/JewishLiturgyProject:Copyrights
The first draft of the JLPTEI guidelines are written on the wiki. Needs review, requesting feedback. Needs work on an encoding tutorial, with the intended audience of application developers. http://wiki.jewishliturgy.org/JLPTEI
A working draft of the schema ODD which compiles to RelaxNG, DTD, and W3C XML Schema) is available in our Google Code-hosted Subversion archive. (ODD stands for One Document Does it all, a
System Administration (Azriel, Efraim, and Aharon):
Efraim purchased a virtual private server service. (Thanks Efraim!)
The new XML database is hosted at http://shell.jewishliturgy.org:8080/exist on the VPS.
New Transcription interface is not ready yet but will be accessible at http://www.jewishliturgy.org
Opensiddur.net has been refreshed, links and material in pages and posts updated.
All JLP/Open Siddur sites now being tracked with google analytics. (Other statistics also available.)
Volunteer Management (Aharon)
Began replying to volunteer transcribers. Waiting for word of new transcription interface. Interested in what feedback to provide volunteer translators, commentary writers.
Would like an XML encoding interface for digitized texts these folk may want to contribute.
A handful of people would like to donate money. Holding off on soliciting funds for now. Really looking for more in-kind contributions. Researching how to structure foundations to support open source software development (e.g., Mozilla Foundation).
Organization/Structure
Open Siddur and the Jewish Liturgy Project are the names of projects initiated by Aharon Varady and Efraim Feinstein, respectively. Efraim and Aharon are drafting a “team charter” to further define a structure and mission statement compatible with their mutual efforts as well as their shared open source and free culture values.
Communication and Promotion (Aharon)
Efraim asks, What sorts of print materials can we create to help promote Open Siddur and attract developers and volunteers? Can we get Jewish day school and high school computer clubs to take an interest in helping develop Open Siddur and by extension learn about Open Source and Free Culture?
First live chat on August 16th and we had a minyan! Focused at first on soliciting technical help. Non-software devs also want to help. Initial response: Open Siddur needs help with historical research, scanning, and promotion. Non software devs can also help with reseach, transcription, and documentation. Translations, art, and commentaries prepared today can be contributed tomorrow. (Thanks to everyone who attended!)
Rabbi Shalom Berger at the Lookstein Foundation noted Open Siddur in their recent newsletter (8/20). (Thanks!)
Team Member Updates
Sarah Allen, a volunteer translator for Open Siddur, is stepping up as our Israel contact person. (Welcome aboard!) She tweets from @sarahballen
Azriel will need to take a step back from some Open Siddur work with the onset of school, hopes to have committed a substantial part of what will become the basis for the transcription framework. Azriel is now blogging about technical development problems/solutions/milestones at realazthat.blogspot.com.
In June, Efraim graduated with a doctoral degree from Harvard in biophysics. Congratulations Efraim! Efraim occasionally tweets from @efraimdf
“Development Status (2009-08-23)” is shared through the Open Siddur Project with a Creative Commons Attribution-ShareAlike 4.0 International copyleft license.
Comments, Corrections, and Queries