Producing PDFs (was Re: Scanned (PDF) original descriptions)

Fabio Moretzsohn fmoretzsohn at HOTMAIL.COM
Tue May 14 19:49:19 CDT 2002


Thank you for the many requests to my paper on the TaxonBank, and the
comments I received from some of you.

While I do not provide any protocol on producing PDFs in my paper, as
someone asked me, I can share the way I figured out empirically how to make
PDFs of reasonable size:

1) Scan hardcopy at 300-400 dpi, and save image file as TIF. Higher
resolution creates a huge file, and not all OCR (optical character
recognition) programs can handle higher resolutions.

2) Open OCR program - I use a version of Xerox TextBridge that came with a
cheap flatbed scanner, so it is probably not too good nor current.
Better/more advanced OCR programs allow you to automatically scan and OCR
without the need to save the image file. Anyway, I prefer to save the image
file to clean it up in Photoshop if needed (sometimes the hardcopy is
stained or there are shadows when scanning from books).

3) Process the image file or directly from scanner. Save OCR'd text in Rich
Text Format (RTF), DOC or other file formats that preserve font and
paragraph formatting.

4) Open document in word processor - I use MS Word. Use spell check to catch
the most obvious errors in OCR, then read carefully the text to avoid common
errors such as "numeral one" instead of "L", or "m" instead of "rn (RN)",
etc.

5) If you have the time and the document is really important, you can have a
good help checking spellings by using a software that reads text aloud.
There are some free ones, or you can download a trial version of shareware.
Some software allow you to make adjustments to the reading voice, to avoid
the robot-like tone.

6) After spelling corrections, margins, etc, verify if you have the same or
similar font to the original, if you want to preserve the same look, or a
facsimile. If only content is important, using a different font is not a
problem. It may be easier to match fonts in recent publications; old
publications with fancy or unusual fonts may not be recognized by the OCR
program.

7) Now produce the PDF file using Adobe Acrobat macro from your word
processor, or print the document to Acrobat, instead of a printer. I prefer
the latter.

8) There are some settings that you can change in Acrobat, such as if font
substitution is allowed, compression of figures, etc.

9) If your file has images, using compression can reduce the size of the PDF
file. Note that in Acrobat 4.0, compression JPEG low means that the image
will be compressed a little, and it will look nicer, but it will result in a
larger file. If you scanned the figure in high resolution the PDF will be
large.

10) You can also check the option to embed fonts. If you have unusual fonts
in your computer, embedding the font will enable the readers of the document
to see the fonts even if they do not have the font, but the resulting PDF
file is larger. There is not need to embed common fonts in the document.

11) Producing a low- and a high-resolution versions of the same file may be
useful if you post the PDF on the web and want to offer the two options (for
fast downloading, or higher resolution, good for photos, as Ken Kinman
pointed out).

The other way to do PDFs is to simply start Acrobat and open an image file
(in one of the commonly recognized formats), then save the file as PDF, but
this will result in a much larger file than if you go through the trouble of
doing OCR.

I am not familiar with LuraWare, that Lynn Raw recommended. Perhaps it can
produce smaller files than Acrobat. There are other programs now other than
Acrobat that can write files in PDF format.

The ideal for mass production of PDF would be a scanner that can do OCR on
the fly and saves the file as PDF. Or a copy machine that does the same, and
saves the files in flash memory, instead of printing in paper. I do not know
if these are available yet, but I would put both in my wish list.

Aloha,  Fabio


----------------------------------------
Fabio Moretzsohn
PhD candidate in Zoology
Department of Zoology
University of Hawaii
2538 The Mall
Honolulu, Hawaii 96822
fmoretzsohn at hotmail.com (preferred)
fmoretz at hawaii.edu


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com




More information about the Taxacom mailing list