[Taxacom] BHL survey: scan quality
Chris Freeland
Chris.Freeland at mobot.org
Fri May 7 18:48:12 CDT 2010
These are all great suggestions, and exactly the kind of detail we want and need to improve BHL. At issue is that not all of BHL scanned books have b&w PDFs made available by Internet Archive. But, if that's a format that more people want ready access to, we'll make that happen.
Chris Freeland
Technical Director, BHL
-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Neal Evenhuis
Sent: Fri 5/7/2010 3:48 PM
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] BHL survey: scan quality
At 10:13 AM -1000 5/7/10, Dean Pentcheff wrote:
>I will confirm what Karl Magnacca had to say.
>
>The scanning strategy for most materials at BHL (via Internet Archive,
>of course) seems to be to make the visual appearance of the pages as
>close to the original as possible, including yellowed paper, etc. This
>comes at the expense (since there are always tradeoffs) of highly
>resolved text. When it comes to plates, the results are visually
>appealing, but often of poor actual resolution.
>
>The PDFs are generated using "Luradocument", which achieves excellent
>compression of those images, but with the cost of long rendering times
>for the pages (again, as mentioned by Karl Magnacca). The result is
>that the PDFs can be very cumbersome to use on anything but the very
>fastest desktop computers.
>
I have had the same problems with speed of rendering for each page of
BHL pdfs and it is indeed frustrating at times. But the solution is
already there. It is just not available on the BHL website.
Internet Archive (http://www.archive.org) has all of the BHL
documents (since they are sent there from BHL nodes for compression
and pdf-ing, etc.) and also has these documents in various formats.
When one googles a particular book title, a link to the document on
the Internet Archive site comes up. When you are there, your book
defaults to the scanned text file, but the navigation box to the left
has all the file formats available. One of these is b/w mode for pdfs.
I have downloaded these whenever I can in place of the default color
pdf (no other choice) on the BHL website and the speed of rendering
pages once downloaded is much faster, although still not nearly as
fast as Google books for the same document (probably because no
compression is done of the b/w pages by Internet Archive).
My suggestion is to make available on the BHL website more file
formats than what is there to download now (i.e., "PDF" (color
only),"OCR" (= scanned textfiles not proofread), "Images", or "All").
Seems simple to do -- just add more links to the downloadable file
selection menu for each on the BHL site.
Why are these other file types only available on the Internet Archive
site and not the BHL site?
-Neal
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) http://taxacom.markmail.org
Or (2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
More information about the Taxacom
mailing list