[Taxacom] GBIF data
Shorthouse, David
dps1 at ualberta.ca
Wed Nov 22 15:52:32 CST 2006
Wolfgang,
You have essentially commented on the ultimate goals for GBIF, but it's
unclear to me how your email or text file solution using pipes helps solve
the issue of digitizing invertebrate collections. Folks at GBIF are
considering text file uploads among other data exchange protocols, but this
really isn't the bottleneck. The true bottleneck is the human-power and
associated funds to enter the data. If "bugs" came with geotagged barcodes,
it would be much easier ;)~ To encourage data entry, I'd argue for
value-added features to assist the data entry process & thus streamline it
as much as possible e.g. real-time nomenclatural checks, geocoding checks,
etc. because what comes out of GBIF is really only as good as what goes in.
As for the choice of fields and XML schema, GBIF has a pretty elegant
(though somewhat sluggish) system with DiGIR and other communication
protocols...not all fields are required and you may use any database system
you want so long it is tied to a web server that can serve PHP. Perhaps what
they also need is better documentation about this system & step-by-step
installable packages.
David P. Shorthouse
------------------------------------------------------
Department of Biological Sciences
CW-403, Biological Sciences Centre
University of Alberta
Edmonton, AB T6G 2E9
Phone: 1-780-492-3080
mailto:dps1 at ualberta.ca
http://canadianarachnology.webhop.net
http://arachnidforum.webhop.net
------------------------------------------------------
-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Faunaplan at aol.com
Sent: Wednesday, November 22, 2006 2:24 PM
To: TAXACOM at mailman.nhm.ku.edu
Subject: [Taxacom] GBIF data
Dear all,
GBIF's data on geographic occurrences of the world's living species are
still
highly fragmentary and, in part, rather unreliable esp. in regard to
insects.
I'm just wondering why not open yet another gate for a large community of
data holders who could support GBIF's mission, - more user-friendly, simple
and
much faster than current procedures.
Here are some musings, - focussing on data from entomological collections:
Insects make up the lion's share of species diversity but reliable
occurrence
data are still mainly written on hundreds of millions of labels pinned to
specimens in museums and collections, - practically inaccessible for most
potential users.
Instead of trying to digitize these data according to a highly complex XML
schema, why not just take data as they are from these specimen labels and
put
them into a simple (flat) data file, say - a pipe delimited text file like
in
the following example:
ID-LABEL|LOC-LABEL|SAMPLESIZE|COLLECTION
"Amara (Brad.)\ majuscula Chd.\ det. Hieke 1982"|"Schwanheim a.M.\ Feld,
18.6.53\ coll. H.Hesse"|1|"ZSM"
"Amara (s.str.)\ pindica Apf.\ det. Hieke 1969"|"Collection\ Strasser\\
Caucasus\\ Eriwan"|1|"ZSM"
"Harpalus\ Winkleri Schbg.\ det. Dr. E.Schaub."|"Mokra pl.\\ Golesnica pl.\
Macedonia\\ Sammlung\ Apfelbeck"|1|"ZSM"
In a second step, we can add a few fields in order to standardize and unlock
the treasure, e.g.:
- COUNTRY (standard names or codes)
- LATLONG (containing a gross latitude/longitude georeference)
- VALIDNAME (containing the standardized current taxonomic name)
example as above:
COUNTRY|LATLONG|VALIDNAME|ID-LABEL|LOC-LABEL|SAMPLESIZE|COLLECTION
Germany|NE50008|Amara majuscula|"Amara (Brad.)\ majuscula Chd.\ det. Hieke
1982"|"Schwanheim a.M.\ Feld, 18.6.53\ coll. H.Hesse"|1|"ZSM"
Armenia|NE40044|Amara proxima|"Amara (s.str.)\ pindica Apf.\ det. Hieke
1969"|"Collection\ Strasser\\ Caucasus\\ Eriwan"|1|"ZSM"
Macedonia|NE41021|Harpalus xanthopus|"Harpalus\ Winkleri Schbg.\ det. Dr.
E.Schaub."|"Mokra pl.\\ Golesnica pl.\ Macedonia\\ Sammlung\
Apfelbeck"|1|"ZSM"
Additional steps, e.g. more detailed georeferencing, could follow later
whenever needed...
Such simple files could be produced during routine curatorial or ID work and
sharing them via email should be very easy. GBIF could use these handy
datasets to create dynamic distribution maps generated by a rather simple
PHP/ MySQL
application. On the new GBIF data portal prototype, I've already seen a
pixel
worldmap that seems to be perfectly fit for the display of standardized
(one-degree latlong) occurrence data by accurrate 2X2 pixel dots.
In addition to such maps, there could be a display of corresponding
standardized background information (e.g., VALIDNAME, COUNTRY, LATLONG,
COLLECTION) in
simple data lists. Both maps and lists could be downloaded by the user with
just a click.
In case access to full data details is seen as problematic (see previous
discussions on "sensitive data"), the field LOC-LABEL could be kept under
restricted access while other essential information is freely accessible.
Tracking
down all information to the sources would be rather easy anyway.
As a side effect, this could also become a central database for collection
inventaries, - wouldn't it be more useful than individual inventaries on
each
Museum's website?
"Free and open access to the world's biodiversity data through the
collaborative medium of the Web is an important tool for the sustainable
stewardship of
Earth. Unlocking such data will lead to much better policy and
resource-management choices locally, regionally and globally" (cited from an
article by Matt
BALL, in GeoWorld, Aug. 2005)
Best wishes,
Wolfgang
---------------------------------------
Wolfgang Lorenz
Faunistics and Environmental Planning
Hoermannstr. 4
D-82327 Tutzing, Germany
(P.S.: I'm currently running two test versions of PHP/MySQL map tools on my
websites; please contact me for details)
_______________________________________________
Taxacom mailing list
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
More information about the Taxacom
mailing list