[Taxacom] GenBank (was The economics of biodiversity database initiatives)
Adam Cotton
adamcot at cscoms.com
Mon Oct 28 09:10:32 CDT 2013
----- Original Message -----
From: "Roderic Page" <r.page at bio.gla.ac.uk>
To: "Adam Cotton" <adamcot at cscoms.com>
Cc: <taxacom at mailman.nhm.ku.edu>
Sent: Monday, October 28, 2013 8:16 PM
Subject: Re: [SPAM?] Re: [Taxacom] GenBank (was The economics of
biodiversity database initiatives)
Hi Adam,
Locality data in GenBank is variable, but increasingly sequences (specially
barcodes) are appearing with GPS-derived coordinates. Some sequences are
linked to voucher specimens (admittedly a lot fewer than would be
desirable), in other cases you can go to the publication that made the
sequences available and get the data there (again, less than desirable, it
would be nice to automate this).
I disagree that
>
> There is absolutely no way to verify that a sequence belongs to taxon A
> rather than taxon B, other than the say-so of the researcher submitting
> the
> sequence to GenBank.
>
One of the advantages of sequences is that we can build a tree and discover
potentially misidentified sequences, indeed this is one of the quickest ways
to discover potential problems. It won't always work, but it is more
testable than a simple assertion that a sequence belongs to a given taxon.
Regards
Rod
>
>
Rod,
Sorry for the lack of clarity. Of course it is possible to spot erroneous
sequences by comparing them with verified ones, and erroneous sequences
often stand out like sore thumbs in a tree, so I should have said:
"There is absolutely no way to verify that a sequence belongs to taxon A
rather than taxon B from information on the GenBank webpage for that
sequence."
The point is there will always be errors, but providing useful information
about the specimen from which the sequence originated would actually make
the data much more useful to more people, and mean users would have to spend
less time verifying the identity of the actual taxon that the sequence came
from.
Just as an example, assume a sequence is added with locality "Russia". That
could be anywhere from Europe almost to Alaska, which is not very helpful.
Of course including a GPS location is ideal, but even an approximate
locality is more useful than just country.
Adam.
PS. I have had private comunication with other researchers who agree that it
would be much more useful if GenBank entries were inputted using end-taxon
names for identification purposes. This also works the other way, for
samples that could only be positively identified to genus or species group
for whatever reason.
More information about the Taxacom
mailing list