[Taxacom] Wikispecies is not a database: part 3 (after thinking about it!)
Stephen Thorpe
s.thorpe at auckland.ac.nz
Thu Aug 13 16:09:31 CDT 2009
Hi Mike and Tony (and all):
Well, "data protection" is a two edged sword - on the one hand you
want to protect your data from corruption, and from that point of view
open source might seem risky. But on the other hand, the reality is
such that there is a real danger with closed source that you just end
up locking up bad data, without much opportunity to easily fix it. By
"bad data", I mean both incorrect data, and out-of-date data. The
great advantage of open source is that it acts both as open-ended peer
review and provides a virtually instant route by which the data can be
updated. The point I was attempting to stress in my last email is that
a distinction can be made between what you might call "data sources"
as opposed to database. A database gets its data from a data source.
An open data source, like Wikispecies, has some significant advantages
(as explained above) over closed data sources. Bear in mind too that
using working taxonomists as data sources may be problematic. They
tend to promote their own theories over alternatives, and it is not
always possible to find a cooperative data source of this kind with
the time or will to keep you constantly updated with all the latest
developments. There is also a big time lag factor associated with
using abstracting agencies as data sources, and, due to them not being
taxonomists themselves, problems with both the interpretation of the
primary taxonomic literature, and with rendering it consistent with
other primary literature which may be working with alternative
classifications and assumptions...
Stephen
Cheers,
Stephen
Quoting Mike Sadka <M.Sadka at nhm.ac.uk>:
> Just one final point:
>
>>> We should prioritise data quality over everything else.
>
> That is exactly why you need properly structured databases (in the
> correct sense) - they store and protect data. If you disagree look at
> any book on relational databases.
>
>>> impressively structured and presented websites ("databases" in the
> broad sense)
>
> This is an incorrect use of the term "database" - not a broad one.
> Websites, however well structured or presented, are not and cannot
> replace databases. A website is just how the data are presented and
> tells you absolutely nothing about how safely or effectively they are
> stored. This is basic IT.
>
> OK - two final points. But more than enough from me now I am sure!
>
>
>
>
>
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Stephen Thorpe
> Sent: 11 August 2009 02:40
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] Wikispecies is not a database: part 3 (after
> thinking about it!)
>
> Hi Mike
>
> I don't know if it is just me, but I find it quite difficult in a
> forum like this to get the details of my argument right first time,
> but then the responses kind of prod me in the right direction, so I
> guess it works out good in the end. Anyway,
>
> [you wrote] This attitude worries me a lot because it seems not to ask
> where that taxonomic information is going or whether best use is made
> of it, and I feel it sells the data short.
> Where does "just ... typing in taxonomic information" get you?
> Without an underlying standard, typing just makes more pages of
> taxonomic information. They may be very useful but they might as well
> be on paper except that they are easier to update and distribute.
> ICT has the potential to search, sort, aggregate and integrate data
> from a range of sources - thereby generating new information, and
> giving those "experienced and knowledgeable" people more and novel
> opportunities to make discoveries - not to replace fieldwork (or
> closetwork), but to maximise the information derived from the data it
> generates. If data are entered willy-nilly into numerous different
> systems without care for their fate, their usefulness is limited and
> maybe shortlived.
> Obviously this isn't an argument against wikispecies, or for
> numerous different OLs. It's an argument for both using common
> standards.
>
> OK, you have made a bit of a straw man here! I am certainly not
> proposing that all we need to do is just type taxonomic information
> willy nilly on to the web! In fact, it is this very thing which has
> caused many of the current problems, because no two web sources seem
> to agree very often, and so the "willinilliness" has resulted in utter
> chaos! On that I think we can agree, hopefully!
>
> My point is this: we need to prioritise things somewhat. We should
> prioritise data quality over everything else. There is no point
> developing flash databases if you don't know where you are going to
> get hold of good data. Some of the examples I have given in previous
> emails show relatively impressively structured and presented websites
> ("databases" in the broad sense) giving poor quality outputs.
>
> Wikispecies already exists as a REASONABLY adequate infrastructure
> upon which to create a solid pool of taxonomic information, which can
> THEN be used as a solid DATA PROVIDER for any number of other database
> initiatives. I just think that more effort at this early stage should
> go into making that source of information comprehensive (and
> verifiable, by way of full referencing).
>
> The attitude that worries me a lot is "build your database first, and
> worry about where the data is going to come from second (if the
> funding doesn't run out first!)". Another related line that I have
> heard goes something like "well, I know there are going to be issues
> about who to believe when data providers come into conflict, but the
> funding for the first year is just to get the infrastructure up and
> running, so we will just have to sort that problem out somehow later
> on down the track" ...
>
> The reality is that many of the people currently in charge of online
> "databases" obsess too much about presentation, and what it COULD do
> (if it had the data!). AFD recently went to all the trouble of
> changing its user interface from something that was fine to something
> rather less than fine, and still they haven't managed to get even that
> 1999 Apteropanorpa name into their system!
>
> Cheers,
>
> Stephen
>
>
>
>
> Quoting Mike Sadka <M.Sadka at nhm.ac.uk>:
>
>>
>> Hi Stephen
>>
>> And bravo Evgeniy !
>>
>> [You wrote...]
>> Everybody won't just adopt them [standards] - we are even constantly
>> having to defend ourselves against factions who want rid of
>> traditional biological nomenclature altogether!
>>
>> But technological standards are not for "everybody" - they are for
>> machines. Taxonomists don't need even to know that they exist, but
>> machines will not be able to serve taxonomy to their full potential
>> without them.
>>
>> [and...]
>> ... instead of just sitting down at a computer online and typing in
>> taxonomic information,...
>>
>> This attitude worries me a lot because it seems not to ask where
>> that taxonomic information is going or whether best use is made of
>> it, and I feel it sells the data short.
>>
>> Where does "just ... typing in taxonomic information" get you?
>> Without an underlying standard, typing just makes more pages of
>> taxonomic information. They may be very useful but they might as
>> well be on paper except that they are easier to update and distribute.
>>
>> ICT has the potential to search, sort, aggregate and integrate data
>> from a range of sources - thereby generating new information, and
>> giving those "experienced and knowledgeable" people more and novel
>> opportunities to make discoveries - not to replace fieldwork (or
>> closetwork), but to maximise the information derived from the data
>> it generates. If data are entered willy-nilly into numerous
>> different systems without care for their fate, their usefulness is
>> limited and maybe shortlived.
>>
>> Obviously this isn't an argument against wikispecies, or for
>> numerous different OLs. It's an argument for both using common
>> standards.
>>
>>
>> [And...]
>> ...so we don't need working taxonomists to help build our databases...
>>
>> I agree - there's been quite enough of that already! You need IT
>> people to buld your databases. ;-)
>>
>> Cheerio, Mike
>>
>>
>>
>> _______________________________________________
>>
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom archive going back to 1992 may be searched with either
>> of these methods:
>>
>> (1) http://taxacom.markmail.org
>>
>> Or (2) a Google search specified as:
>> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of
> these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the Taxacom
mailing list