[Taxacom] Paywall our taxonomic tidbit

Mon Jan 18 13:32:18 CST 2016

Donat,

I agree that taxonomy is not making full use of the power of the internet, but I still think we need to step back and get the basics right. Aggregating data seems to me to be creating a mess, and there is nothing particularly useful to be gained from the data, due to the fact that it is incomplete, biased in various ways, and often contradictory. The primary task of taxonomy is, as it has always been, to put names on taxa in a fairly rigorous manner. This task has already become diluted by "systematics", i.e. the inherently inconclusive investigation of the evolutionary relationships between taxa, with no particular practical application. Taxonomic publications fare badly by recently developed metrics by which employers judge employees.

Stephen

--------------------------------------------
On Mon, 18/1/16, Donat Agosti <agosti at amnh.org> wrote:

 Subject: Re: [Taxacom] Paywall our taxonomic tidbit
 To: "Taxacom" <taxacom at mailman.nhm.ku.edu>
 Received: Monday, 18 January, 2016, 9:02 PM

 In a sense, this whole discussion is
 misguided. The Internet is not about articles as we have
 been used to in the pre-digital age, and not about a PDF,
 even though we can ship this via email or in some cases
 access via a mouse click (Open Access). 

 The Internet is about linking data and building a knowledge
 management system or knowledge graph. This is well beyond
 the sum of data in the articles. And paywalls are walls that
 inhibit building such as network. If we maintain them, then
 we cannot make use of the new properties that the Internet
 provides us.

 Open linked data also allows text and data mining over
 potentially the entire corpus of not only taxonomic
 literature, but well beyond. 

 Taxonomic articles can be very rich in data. It allows
 others to look at hour contributions in a way we don't.
 Bibliographic citations allow to build citation networks and
 measure the use of our literature. Taxonomic treatments and
 included citations allow building up by machine the
 catalogue of life. DNABarcode, collection codes allow to
 understand who and where specimens of which collection have
 been used.

 A network allows to enter our knowledge from very different
 angles, such as a specimen, a location, a collector, a
 DNA-sequence and ask questions of who collected in location
 x in a give period? Who widely are species distributed in a
 given area? Who is the host of a particular species? This
 all, besides being able to look at a single treatment, a
 single article, a single key.

 A lot of the elements for this is developed outside our
 community and we just need to make us of it. DOIs for
 articles are the unit for citing and identifying articles
 supported by CrossRef and DataCite. Persistent identifiers
 are being used for authors (e.g. ORCID). Solutions adopted
 in our community are for specimens (e.g. httpURI in CETAF
 and other subscribers of the Bouchout Declaration),
 BarCodes, Names (e.g. LSIDs in Zoobank), httpURI for
 treatments in Plazi. All of them are deployed and are
 crystallization points for the big network, because they are
 used such to cross reference, within the sciences and
 beyond. Wikipata makes use of PIDs from NCBI, IT IS, GBIF,
 Plazi, EOL, which all are data that orginated from published
 records.

 But more importantly, we have one of the most advanced
 publication scheme in the sciences with the Biodiversity
 Data Journal, Zookeys and the reminder of the Pensoft
 journals. This allows not just getting a PDF or html of the
 content. But at the moment of publication, the data within
 is either directly pushed to GBIF, EOL, or Plazi, or 
 from the latter to NCBI, Wikidata. 

 Theses Open Access journals, paid upfront, are much cheaper
 than the average Daniel listed below.  More
 importantly, any PDF produced now needs somebody who
 extracts the data within, such as add the names to dedicated
 databases, extract body length as traits, extracts
 observation records, extract images or tables, extract the
 treatments and bibliographic records, if we want to make
 this piece of knowledge accessible within the Internet and
 open for an efficient data mining. So, it is not just
 publishing or access costs that count, but the almost
 unsurmountable costs of reuse that prohibit making our
 biodiversity knowledge part of the global knowledge graph,
 cloud or just our cultural heritage.

 There is no way around Opend Access. If we don't do it, and
 our knowledge is really that relevant, in the very near
 future we will pay dearly because the big publishers will
 not only ask a huge amount of money to access our journals
 or produce them OA, but more importantly, they will charge
 hilariously access fee to the knowledge base they create by
 making use of all the data we deliver them for free. And if
 you do want to do science, you will depend on this access,
 that most of us will not be able to afford. 

 So, the discussion must be how we build our knowledge
 management system that makes us part of the bigger picture.
 And I think the dire state of biodiversity with an
 increasing pressure, and the exciting rapidly developing
 genomic data make it adamant that we too use the state of
 the art tools to communicate our science and provide access
 to it - especially as our community is among the leaders in
 this area in the sciences.

 Donat

 -----Original Message-----
 From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu]
 On Behalf Of Daniel Mietchen
 Sent: Monday, January 18, 2016 3:48 AM
 To: Taxacom <taxacom at mailman.nhm.ku.edu>
 Subject: Re: [Taxacom] Paywall our taxonomic tidbit

 It may be worth considering here that in the current system,
 billions of dollars are going to the publishing industry
 every year already (globally, and across all disciplines),
 and have been doing so for many years.

 From http://doi.org/10.1038/495426a : "Data from the
 consulting firm Outsell in Burlingame, California, suggest
 that the science-publishing industry generated $9.4 billion
 in revenue in 2011 and published around 1.8 million
 English-language articles — an average revenue per article
 of roughly $5,000. Analysts estimate profit margins at
 20–30% for the industry, so the average cost to the
 publisher of producing an article is likely to be around
 $3,500–4,000."

 Most of this is through subscriptions (by libraries,
 corporations or individuals), some of it through
 advertising, some from other sources (e.g. database access,
 membership schemes). Most of this is invisible to most
 researchers, the exceptions being things like page charges
 or color figure charges in traditional venues or OA fees
 more recently.

 Now consider a thought experiment: If every single one of
 the ca. 2 million articles we publish every year would be
 published for an OA fee in the PLOS ONE range (ca. USD
 1,500), that would cost USD 3 billion altogether, which is
 roughly the amount of *profit* the publishing industry is
 making now.

 While many traditional publishers (and especially their
 hybrid
 journals) hover well above those 1,500 dollars, many newer
 ones have OA fees well below that, often due to more
 efficient workflows. So if OA at the efficiency of PLOS ONE
 or better were to replace the traditional publishing model,
 this would mean significant savings (billions per year
 eventually) for the scientific community - and thus the
 public - which we could use to build an infrastructure that
 would make scholarly communication more efficient, to
 include things beyond PDF and discovery mechanisms beyond
 citations and journal TOC alerts.

 Besides, the educational value of a paywall to lay readers
 interested in taxonomy rarely tops that of a relevant OA
 paper.

 Daniel
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

 Celebrating 29 years of Taxacom in 2016.
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

 Celebrating 29 years of Taxacom in 2016.