An Economic Argument for Open Data — by Efraim Feinstein (Open Siddur 2009)

Contributor(s):

Shared on:

9 February 2010 under the Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 International copyleft license

Categories:

Tags:

economics, free culture, open-source, semantic data, what is free

This post continues the series of advocacy posts directed at Jewish content creators and aggregators. Other parts of the series discussed why you should avoid the licenses with the non-commercial-use only (NC) terms and issues of copyright license compatibility and the connection between copyright licensing and remixability. This post details the global communal benefit of free primary data resources. In particular, it urges you to share primary data such as the digital text of a public domain transcription. The author, Efraim Feinstein, is lead developer of the Open Siddur Project.

There are two principles on which the success of data on the contemporary web rests: the web makes content available, and it adds value to that content by linking it to other related information.

When considering bringing old content online, both of these aspects are important. A first level of digitization involves simply making data available. Google Books and Hebrewbooks.org work at this level, providing PDFs and/or OCR-ed transcriptions of the material. A second level of digitization involves semantic linkage of the data, both internal to the site and external to the site. The Open Siddur Project and Open Scriptures digitize at the semantic level. This second-level digitization is required to do all of the cool things we expect to be able to do with online texts: click on a word and find its definition or grammatical form, find the source of a passage in one text in another text, find how the text has evolved historically, etc. Even the simplest form of a link: a reference from another site, requires some kind of internal division.

Digitization that takes advantage of the web therefore requires a number of steps: (1) getting the basic text online, (2) getting it in an addressable form (to make it more like typed text, instead of a picture of a page), (3) assuring the text’s accuracy, and (4) marking it up for semantic linkage. Some of these steps, or parts of them can be done automatically, but, overall, they require some degree of intelligent input. Even step 1, which is primarily mechanical in nature, requires design of the procedures.

I hope that this outline of the required steps to getting a text online suggests that the most expensive part of making content available is human labor — it takes time to do it, and it takes even more time to do it right.

And now for the rhetorical questions:

How many times has the Tanach been digitized?
… the siddur?
… the Talmud?
… major commentaries on the siddur, Torah, Talmud (Rashi, Tosefot)?
… full codes of Jewish law (Mishneh Torah, Tur, Shulchan Aruch, Aruch Hashulchan)?
… uncommon piyyutim (liturgical poems)?

In some cases, the answer is: it’s been done many times. In other cases, the answer is: it’s never been done. And, both answers lead the all-important question: why? Why are there so many digitizations of the Tanach and no full digitizations of Shulchan Aruch online? Why isn’t the siddur already hyperlinked to its Talmudic sources?

I would posit that we have been wasteful with our resources. Earlier, I pointed out that the primary resources that go into these advanced digitizations are time and human labor. In some cases, these resources equate directly to money, in others, the linkage is more indirect.

The core material of all of the above-mentioned works comes from the public domain. It is ownerless, and free for anyone to copy for any purpose. Every time we encounter a basic text that we have to digitize again because of “new copyright” claims or EULA-style contractual constraints, that is an indication of a failure somewhere in the system. This is particularly true if the claims are being made by non-profits, “social” businesses, or academic institutions. In the Jewish world, even for-profit published books are sometimes donation-supported. Each common text that has to be digitized a second, third, or hundredth time equates to another less common text that is not being digitized. Redoing basic OCR work and transcription takes resources away from establishing semantic linkages.

Some people and organizations get it. As of now, we only need one digitization of the Leningrad Codex (Masoretic Bible). That’s because Christopher Kimball and the J. Alan Groves Center for Advanced Biblical Research digitized it, transcribed it, and released it as free data. The Westminster Leningrad Codex is now perhaps the most built-off version of the Hebrew Bible online. The base texts (which may be used “without restriction”) are present in both commercial and non-commercial products. The Open Siddur Project is using it both for its technology demonstrations and as the basis of all biblical texts in the siddur.

There are precious few examples of free data in the Jewish community, even on the Internet. There are copious examples of donation-funded organizations presenting primarily public domain data with new copyright claims.

Free data prevents the necessity of duplication of effort, which, in turn, prevents the community as a whole from unnecessarily wasteful spending. Particularly for organizations with a social mission, its use is a win for everyone.

. Creative Commons Attribution-ShareAlike . 4.0 . International .

“An Economic Argument for Open Data — by Efraim Feinstein (Open Siddur 2009)” is shared through the Open Siddur Project with a Creative Commons Attribution-ShareAlike 4.0 International copyleft license.

Efraim Feinstein

Efraim Feinstein is the developer of an Open Siddur web application.

Stable Link: https://opensiddur.org/?p=359

Associated Image:

(This image is set to automatically show as the "featured image" in shared links on social media.)

Source Data: XML | JSON

Re-formatted: HTML | ODT

Terms of Use: Be a mentsch (a conscientious, considerate person) and adhere to the following guidelines:

Properly attribute the work to Efraim Feinstein.
Clearly indicate the date you accessed the work and in what ways, if any, you modified it. (If you have adapted the work, let us know so that the contributor might consider endorsing your revision.)
Provide the stable link to this resource: <https://opensiddur.org/?p=359>.
Indicate that the original work was shared under the Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 International copyleft license. (To redistribute or remix this work in any format, modified or unmodified, you must refer to the terms of the license under which the work is shared.)

Additional Notes:

The views expressed in this work represent the views of their creator(s) and do not necessarily represent the views of the Open Siddur Project's developers, its diverse community of volunteer contributors, or its institutional partners.
We strongly advise against printing sacred texts and art containing divine names as these copies must be regarded with reverence, complicating their casual treatment and disposal.
If you must dispose of a printed sacred text (one containing Divine Names), please locate the closest genizah (often established by a synagogue) and contact its custodians for further instructions. We also recommend using Morah Yehudis Fishman's Prayer for Adding a Work to the Genizah.

Support this work: The Open Siddur Project is a volunteer-driven, non-profit, non-commercial, non-denominational, non-prescriptive, gratis & libre Open Access archive of contemplative praxes, liturgical readings, and Jewish prayer literature (historic and contemporary, familiar and obscure) composed in every era, region, and language Jews have ever prayed. Our goal is to provide a platform for sharing open-source resources, tools, and content for individuals and communities crafting their own prayerbook (siddur). Through this we hope to empower personal autonomy, preserve customs, and foster creativity in religious culture.

ויהי נעם אדני אלהינו עלינו ומעשה ידינו כוננה עלינו ומעשה ידינו כוננהו "May the pleasantness of אדֹני our elo’ah be upon us; may our handiwork be established for us — our handiwork, may it be established." –Psalms 90:17

Pocket Reddit Facebook Bluesky Telegram WhatsApp SMS Email

Read a comment / Leave a comment (moderated)

1 comment to An Economic Argument for Open Data — by Efraim Feinstein (Open Siddur 2009)

“An Economic Argument for Free Primary Data” | Open Scriptures
2010-02-10 at 3:27 am · Reply.
[…] Efraim Feinstein wrote an excellent blog post on Open Siddur Project Development Blog on “An Economic Argument for Free Primary Data”. Here’re the introductory […]