[Taxacom] BHL survey: scan quality
Neal Evenhuis
neale at bishopmuseum.org
Fri May 7 15:48:15 CDT 2010
At 10:13 AM -1000 5/7/10, Dean Pentcheff wrote:
>I will confirm what Karl Magnacca had to say.
>
>The scanning strategy for most materials at BHL (via Internet Archive,
>of course) seems to be to make the visual appearance of the pages as
>close to the original as possible, including yellowed paper, etc. This
>comes at the expense (since there are always tradeoffs) of highly
>resolved text. When it comes to plates, the results are visually
>appealing, but often of poor actual resolution.
>
>The PDFs are generated using "Luradocument", which achieves excellent
>compression of those images, but with the cost of long rendering times
>for the pages (again, as mentioned by Karl Magnacca). The result is
>that the PDFs can be very cumbersome to use on anything but the very
>fastest desktop computers.
>
I have had the same problems with speed of rendering for each page of
BHL pdfs and it is indeed frustrating at times. But the solution is
already there. It is just not available on the BHL website.
Internet Archive (http://www.archive.org) has all of the BHL
documents (since they are sent there from BHL nodes for compression
and pdf-ing, etc.) and also has these documents in various formats.
When one googles a particular book title, a link to the document on
the Internet Archive site comes up. When you are there, your book
defaults to the scanned text file, but the navigation box to the left
has all the file formats available. One of these is b/w mode for pdfs.
I have downloaded these whenever I can in place of the default color
pdf (no other choice) on the BHL website and the speed of rendering
pages once downloaded is much faster, although still not nearly as
fast as Google books for the same document (probably because no
compression is done of the b/w pages by Internet Archive).
My suggestion is to make available on the BHL website more file
formats than what is there to download now (i.e., "PDF" (color
only),"OCR" (= scanned textfiles not proofread), "Images", or "All").
Seems simple to do -- just add more links to the downloadable file
selection menu for each on the BHL site.
Why are these other file types only available on the Internet Archive
site and not the BHL site?
-Neal
More information about the Taxacom
mailing list