Imaging (Scanning) Books, Manuscripts, and Ephemeral Works

Digitization using cell phone camera, lamp, and snake clamp with universal cell phone mount (credit: Aharon Varady, license: CC BY-SA)

All digital text in the Open Siddur Project database derives from a document in a digital or analog format.

For works preserved in analog formats (printed on paper, handwritten on scrolls or papyrus, inscribed in stone, impressed in metal, etc.) we must first image the document in which that work appears. Imaging is the process of scanning or taking digital photographs of that work.

Document or manuscript imaging is the necessary first step in processing text that must then be transcribed, proofread, and entered into the database. Volunteers can help the project in scanning Jewish liturgy and related work that is either already in the Public Domain or else contributed by its copyright owner with an Open Content license. (If you would like to begin looking for a book to image, please see our wishlist.)

Imaging is critical in providing a source image by which collaborative transcription and proofreading may be performed. Maintaining links between the document image and the digital text in our database is also important in order for the authority of the text to be available to future researchers.

A Short Imaging Tutorial

What to image

Please image the whole book including the title page and, if available, the copyright page.

Sometimes, the content desired makes up only shot segments, sections, or pages of the books. In such circumstances, the title page and copyright page must still be imaged.

Digitization using cell phone camera, lamp, and snake clamp with universal cell phone mount. Glass from a picture frame laid upon the book keeps it flat enough for a satisfactory OCR. (credit: Aharon Varady, license: CC BY-SA)

Image Format

The Open Siddur Project needs scanned images of text that can be transcribed easily. With too low a resolution and it is nearly impossible to make out the many diacritical marks that form vowels and other significant distinctions in words and letters.

A good image is made at a resolution where the characters in the smallest text on the page can be resolved into a unique character when the image is displayed at 100% zoom.

For these reasons, images should be made at a high resolution: at least 300dpi (dots per inch). For our purposes, images can be JPG or PNG files.

Naming files

It is best to name files in a sequence where each page is contained in one file, and the filename is the page number followed by the extension. For example, 5.jpg for page 5, or 10.jpg for page 10.

Do not be concerned whether your numbering of images remains consistent with the page numbers. Simply treat each page with its own number from the first leaf to the back leaf.

If page numbers repeat across two sides of the same folio (for instance, between a recto and its verso), it is not necessary to indicate which is side a or b.

If, for some reason, you cannot follow these conventions, then send us the scans with each file numbered in sequence (1.jpg, 2.jpg, 3.jpg, etc.).

Image Processing

ScanTailor is open-source software that processes your images into page images suitable for compiling into a reasonably sized PDF that we can upload to the Internet Archive. If you’d prefer not to do this step. That is fine. The main thing is to take high quality images of all the pages of the book or manuscript.

Temporary Storage for Uploading Images

After you’ve imaged a book, you’re left with hundreds of image files. The size of an entire collection can easily be more than 250mb — far larger than can be distributed via email. Free services like dropbox.com offer temporary storage and dissemination of compressed batches of scanned images.

After you’ve scanned the book,

compress the collection in your preferred compression format (zip, tar.gz, rar),
upload it to an account at dropbox.com, Google Drive, box.net (or other similar service), and
contact us with a link where we can download the file.

Credit for Imaging

The Open Siddur Project will credit all contributors of high quality imaging for their scanning or photography.