[Taxacom] Random taxonomy

David Campbell pleuronaia at gmail.com
Fri Nov 29 16:04:26 CST 2013


Confusion on this sort of issue is widespread in science.  About 20 years
ago, Science published a letter claiming that statistics implies that,
because the odds that a human selected at random is the Pope is one in a
few billion, that therefore it is highly likely that the Pope is not
human.  In reality the problem was inaccurate null hypothesis formulation.
It is extremely unlikely that the Pope was a statistically randomly
selected example.

Somewhat more common than taxonomic confusion about the Pope are confusion
about the meaning of claimed error bars.  I've seen the claim that
molecular clock techniques are appropriate because they were not rejected
with 95% confidence.  No, the null model of no molecular clock should be
used unless there is strong support for a molecular clock.  Similarly, the
confidence interval output by a molecular clock analysis relates to the
replicability of your analysis with the data as input and cannot take into
consideration those uncertainties that you failed to include in your
calculation of error (e.g., unjustified or wrong calibration).

In the particular case of the near 100% misidentification rate, there are
indeed two quite different questions: What are the chances that a
particular person would achieve this, and what are the odds of this
happening if someone were to simply randomly assign names?  The particular
case is extremely difficult because we don't know how the person made the
assignments.  However, the random chance can be calculated easily.

In taxonomy, we are dealing with the results of the history of organisms as
it happened.  In one sense, the odds of things turning out exactly as they
have is extremely low - if you assume particular probabilities for every
contingent event that you can think of.  In another sense, the probability
is 100% - what has happened has happened.  In yet another sense, the
probability might be fairly high though not 100%, if you are interested in
whether things would turn out fairly similar to the way that they are.  In
this case, how high the probability is depends on what passes your
criterion for "fairly similar" as well as what odds you assign to each
event.


On Fri, Nov 29, 2013 at 4:32 PM, Stephen Thorpe
<stephen_thorpe at yahoo.co.nz>wrote:

> Can someone who understands probability explain the following to me?
>
> Suppose that there is a symmetrical fork in the road leading to two
> equivalent residential areas. A car approaches the fork. What is the
> probability that it will go left?
>
> There seems to me to be two quite different notions of probability
> involved, and I don't know which of them is what people mean by
> 'probability':
>
> (1) For a large enough sample of randomly selected cars, the probability
> will be 50% (i.e. for 1000 cars, approx. 500 will turn left); but
>
> (2) For any particular car, the probability of it turning left will depend
> on such factors as where the driver lives, etc., and is therefore extremely
> unlikely to be 50%. Though it is unlikely to be either 100% or 0% either,
> since they may be going to visit a friend down the opposite fork, or any
> number of other factors might cause them to turn down the other fork
> occasionally, and all possible probabilities from 0-100% are possible!
>
> So, what is the probability of the car going left?
>
> Stephen
>
> From: JF Mate <aphodiinaemate at gmail.com>
> To: Taxacom <taxacom at mailman.nhm.ku.edu>
> Sent: Saturday, 30 November 2013 10:13 AM
> Subject: Re: [Taxacom] Random taxonomy
>
>
> My mistake. I didn´t read you initial post carefully.
>
> Jason
>
>
> On 29 November 2013 22:05, Knut Rognes <knut at rognes.no> wrote:
> > Both cases were meant to be the same. The supply of labels is infinite
> for
> > each species name. (Which is sampling with replacement).
> >
> > Knut R
> >
> > -----Opprinnelig melding-----
> > Fra: taxacom-bounces at mailman.nhm.ku.edu
> > [mailto:taxacom-bounces at mailman.nhm.ku.edu] På vegne av JF Mate
> > Sendt: 29. november 2013 21:47
> > Til: Taxacom
> > Emne: Re: [Taxacom] Random taxonomy
> >
> > Are we talking about the hypothetical 50 boxes and 50 labels example or
> the
> > real life example? Because they are completely different. One has
> sampling
> > with replacement and the other doesn´t.
> >
> > Jason
> >
> > On 29 November 2013 21:31, Peter Rauch <peterar at berkeley.edu> wrote:
> >> Making no assumptions about the likely relative abundance of the 50
> >> species in nature, and about the relative likelihood of those species
> >> being collected, then ...
> >>
> >> The probability that each specimen would be correctly determined by
> >> randomly assigning one of the fifty names to to each specimen --i.e.,
> >> pick one specimen from the 43 and randomly assign it a name from among
> >> the fifty
> >> names-- is 1 in 50.
> >>
> >> The probability of correctly naming the second specimen is exactly the
> >> same: 1 in 50.
> >>
> >> Etc. through all 43 specimens.
> >>
> >> [Note that by ignoring any assumptions, as stated above, the 1-in-50
> >> probability holds little credibility.)
> >>
> >>
> >> For the robot (i.e., the random process of assigning the name to the
> >> specimen) to "do the job as well as the human", the robot would need
> >> only identify one specimen correctly --i.e., a correctly named
> >> specimen only once among the 43 specimens.
> >>
> >> The probability of the robot doing that is actually very high (esp.
> >> relative to the notion that it is infinitely small, as some have
> > suggested).
> >>
> >> However, because the question relates to a real situation --about
> >> actual blowflies collected in a particular country-- the assumption
> >> that each of the fifty species known to occur in that country is
> >> equally likely to be collected is probably a very weak assumption.
> >> More likely, some of the fifty species are very likely to be collected
> >> repeatedly, and other species will be rarely collected (this also
> >> assumes that blowfly collectors are not out collecting blowflies with
> >> a biased focus on obtaining particular species, or collecting in
> specific
> > "habitats", etc).
> >>
> >> So, assuming that the likelihood that the frequency distribution of
> >> species represented in the collection of 43 specimens is more like the
> >> one found in nature, the game of simple random assignment of species
> >> names to a specimen is a worse case model; the model could be improved
> >> --to be more realistic-- if the species names were being pulled
> >> randomly from a bucket of names that were found in the bucket with the
> >> same frequency as those species are encountered in nature.
> >>
> >> To look at it another way, this person could have named every one of
> >> the 43 specimens with the name of the most common-occuring species in
> the
> > country.
> >> Unless the collection of 43 specimens was built in a very biased
> >> manner, it is highly likely that ONE specimen would be correctly
> >> identified by the person.
> >>
> >> All in all, the answer to the problem is going to be quite suspect
> >> because of these various factors of biases being likely to come into
> >> play (making the worst case model a very poor representation of the
> > reality).
> >>
> >> Peter
> >>
> >> On Fri, Nov 29, 2013 at 7:55 AM, Knut Rognes <knut at rognes.no> wrote:
> >>
> >>> Thanks to all for replying outside and within the list.
> >>>
> >>> My raising the question of random taxonomy was inspired by a real
> >>> case study. 43 specimens of blowflies was identified by a certain
> >>> person. In the person's country there are about 50 species of
> >>> blowflies. All his identifications was erroneous, except for one. My
> >>> thought was then: Would a robot have done better, given a label
> > dispenser?
> >>>
> >>> Some replies I have got suggest that the robot might have done a job
> >>> as good as the human.
> >>>
> >>> Knut
> >>>
> >>>
> >>>
> >>>
> >>> On 29 November 2013 11:24, Knut Rognes <knut at rognes.no> wrote:
> >>> > Dear Taxacomers,
> >>> >
> >>> >
> >>> >
> >>> > I have a statistical problem.
> >>> >
> >>> >
> >>> >
> >>> > Consider 50 black boxes, within each is a specimen of fly. Each fly
> >>> > has been identified by someone, its name written on the inside of
> >>> > the box, but this is invisible to you. You cannot peek inside. Each
> >>> > fly belong to one of 50 possible species.
> >>> >
> >>> >
> >>> >
> >>> > You have at your disposal the 50 possible species names for these
> >>> > flies, each name printed on an adhesive label, the supply of
> >>> > printed labels for each name is limitless.
> >>> >
> >>> >
> >>> >
> >>> > Here is the game: you affix a random label on the outside of a
> >>> > random
> >>> box.
> >>> >
> >>> >
> >>> >
> >>> > Now the problem: What is the likelihood that you put a correct
> >>> > label on the box, i.e. that the name on the label matches the
> >>> > identity of the
> >>> fly within?
> >>> >
> >>> >
> >>> >
> >>> > Knut Rognes
> >>> >
> >>> > Oslo, Norway
> >>>
> >>>
> >> _______________________________________________
> >> Taxacom Mailing List
> >> Taxacom at mailman.nhm.ku.edu
> >> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >>
> >> The Taxacom Archive back to 1992 may be searched with either of these
> > methods:
> >>
> >> (1) by visiting http://taxacom.markmail.org/
> >>
> >> (2) a Google search specified as:
> >> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> >>
> >> Celebrating 26 years of Taxacom in 2013.
> >
> > _______________________________________________
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom Archive back to 1992 may be searched with either of these
> > methods:
> >
> > (1) by visiting http://taxacom.markmail.org/
> >
> > (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom
> > your search terms here
> >
> > Celebrating 26 years of Taxacom in 2013.
> >
> >
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org/
>
> (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
>



-- 
Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Box 7270
Gardner-Webb University
Boiling Springs NC 28017



More information about the Taxacom mailing list