Web tracking by academic publishers

Providers of texts in electronic format often gather data on their readers: this is the case with E-readers.[1] In the case of academic publishers, web tracking is part of a general trend towards data-driven management of research and higher education, where the data are collected and sold by private companies.

Context and motivations edit

New sources of revenue edit

In the 2010s, major commercial publishers, in addition to providing content, have started performing data analytics. This is in particular the case of Elsevier, Pearson and Cengage.[2] In 2018, the three leading research data analytics vendors were Clarivate, Digital Science (a division of the Holtzbrinck Publishing Group), and Elsevier (a publisher).[2] These companies are selling research intelligence data tools to universities, research funders, and governments.

For example, Elsevier is selling the information system Pure to universities, with the claim of providing a "comprehensive overview of all their research activities" by aggregating "information from all their data sources".[3] The 2020 partnership of Elsevier with Dutch research institutions, bundles a Publish and Read contract with research intelligence services.[4] In 2018, Elsevier won a contract for collecting data for the European Commission's Open Science Monitor.[5][6] The Irish Science Foundation is basing its strategy on data it purchases from Elsevier.[7]

Improved services edit

Tracking readers allows publishers to improve services, for example by providing targeted reading suggestions, or by adapting search results to personal profiles.[8][9]

Integrating the research workflow edit

The acquisition of research workflow tools by big publishers has been attributed to a strategy of research workflow embedment, in other words vertical integration of academic infrastructure.[10] For example, Elsevier has acquired the reference manager Mendeley in 2013 and the preprint server SSRN in 2016.

It has been theorized that this integration leads to a data-driven organization of research.[11] The focus is no longer the scholarly article, but the individual researcher, whose online behaviour generates valuable data.[12]

Protecting copyright edit

The pirate website Sci-Hub has been threatening the subscription revenues of publishers. Sci-Hub downloads articles from publishers' websites using genuine university credentials. Some publishers have been claiming that this is a threat to universities' network security, and have founded the Scholarly Networks Security Initiative for combating it.[13] The initiative has been advertising tools for tracking users[14] before declaring in 2021 that is does not advocate the use of spyware.[8]

Methods edit

Standard methods edit

Academic publishers use standard methods of web tracking.[15] They gather information on users who connect to their websites, such as login data, browser fingerprints or IP addresses. Extra information can be provided by third-party cookies that publishers insert in users' computers. A 2019 study of 15 publisher websites found an average of "18 third-party assets being loaded on their article pages".[15]

Data collection is facilitated by tools that are ostensibly designed for helping readers access the literature, such as GetFTR, an academic implementation of Single sign-on.[1]

Data on individual users can be aggregated using "audience tools", i.e. commercial software from companies such as Adobe, Oracle or Neustar.[15]

Specialized tools edit

Systems for managing academic libraries, which may be provided by Alma, ExLibris or OCLC, can perform data collection. Libraries can become dependent on such systems.[1]

Criticism edit

Objections edit

Web tracking by academic publishers has been criticized for:

  • Infringing on academics' privacy.[15]
  • Threatening academic freedom.[16]
  • Transforming science into a data analytics business, while driving the development of new monopolies.[16]
  • Informing governments on dissident intellectuals.[16]

Protests and statements edit

  • In 2021, the League of European Research Universities issued a data statement with the aim of tackling the "increasing dependence on dominant platform companies".[17]
  • A 2021 petition demanded that publishers "stop tracking science", and asked research institutions to sign the DORA declaration.[16]
  • A 2021 statement by the Invest in Open Infrastructure organization, also supported by other organizations, called for more oversight and regulation of Clarivate after its acquisition of ProQuest, with the aim of reining in "surveillance capitalism" in scientific research.[18]
  • In 2021, the American Library Association issued a Resolution on the Misuse of Behavioral Data Surveillance in Libraries.[19]

References edit

  1. 1.0 1.1 1.2 Siems, Renke. "Das Lesen der Anderen: Die Auswirkungen von User Tracking auf Bibliotheken". O-Bib. Das Offene Bibliotheksjournal (VDB). doi:10.5282/o-bib/5797. 
  2. 2.0 2.1 Aspesi, Claudio; Allen, Nicole Starr; Crow, Raym; Daugherty, Shawn; Joseph, Heather; McArthur, Joseph Thomas William; Shockey, Nick (2019-04-03). SPARC Landscape Analysis: The Changing Academic Publishing Industry – Implications for Academic Institutions. Center for Open Science. doi:10.31229/osf.io/58yhb. 
  3. "Pure - The world's leading RIMS or CRIS". Elsevier Solutions. 2021-08-27. Retrieved 2022-01-23.
  4. "Dutch research institutions and Elsevier initiate world's first national Open Science partnership". Universiteiten van Nederland. 2020-05-19. Retrieved 2022-01-23.
  5. Tennant, Jon (2018-06-29). "Elsevier are corrupting open science in Europe". the Guardian. Retrieved 2022-01-24.
  6. "Elsevier serves the global research community to deliver open science". Elsevier Connect. 2018-07-02. Retrieved 2022-01-24.
  7. McCall, Barry (2021-08-26). "Propelling Ireland to first-mover status in research and innovation". The Irish Times. Retrieved 2022-03-19.
  8. 8.0 8.1 Committee on Scientific Library Services and Information Systems of the Deutsche Forschungsgemeinschaft (2021). "Data tracking in research: aggregation and use or sale of usage data by academic publishers" (PDF). Deutsche Forschungsgemeinschaft.
  9. "Comment les éditeurs scientifiques surveillent les chercheurs". Le Monde.fr (in French). 2022-01-17. Retrieved 2022-01-25.
  10. Posada, Alejandro; Chen, George (2018-06-15). Inequality in Knowledge Production: The Integration of Academic Infrastructure by Big Publishers. OpenEdition Press. doi:10.4000/proceedings.elpub.2018.30.
  11. Herb, Ulrich (2018-04-26). "Zwangsehen und Bastarde". Information - Wissenschaft & Praxis (Walter de Gruyter GmbH) 69 (2-3): 81–88. doi:10.1515/iwp-2018-0021. ISSN 1619-4292. 
  12. Moore, Samuel A. (2020-07-28). "Individuation through infrastructure: Get Full Text Research, data extraction and the academic publishing oligopoly". Journal of Documentation (Emerald) 77 (1): 129–141. doi:10.1108/jd-06-2020-0090. ISSN 0022-0418. 
  13. "Combat Cybercrime". Scholarly Networks Security Initiative. Retrieved 2022-01-26.
  14. "Proposal to install spyware in university libraries to protect copyrights shocks academics". Coda Story. 2020-11-13. Retrieved 2022-01-26.
  15. 15.0 15.1 15.2 15.3 "User Tracking on Academic Publisher Platforms". Cody Hanson. 2019-10-04. Retrieved 2022-01-19.
  16. 16.0 16.1 16.2 16.3 "Stop Tracking Science". Stop Tracking Science. 2021. Retrieved 2022-01-30.
  17. "LERU Data Statement". LERU. Retrieved 2022-03-19.
  18. Thaney, Kaitlin (2021-06-03). "Take action to stop the lock up of research and learning". Invest in Open Infrastructure. Retrieved 2022-03-22.
  19. "Resolution on the Misuse of Behavioral Data Surveillance in Libraries". Advocacy, Legislation & Issues. 2021-02-25. Retrieved 2022-03-22.