Last month we excitedly boarded trains bound for Washington, DC to attend this year’s Professional & Scholarly Publishing (PSP) Conference, entitled “Data, Diversity, and Discovery: How Professional and Scholarly Publishing Is Accelerating Change and Growth.” Although the keynote speaker opened with a discussion on President Trump and science, the recurring theme of the conference was anything but political — focusing instead on how to better incorporate data and technology growth into the publishing environment. Here is a look at some of the discussions from the event:
The opening discussion, “Tale of Two Continents: Open Access in Europe and the U.S.,” provided an overview on Open Access (OA) from multiple perspectives. The first speaker, Amanda Click, spoke from the perspective of a Business Librarian at American University. Academic libraries support both Green OA and Gold OA. Green OA is a mixture of funding and processes, which can involve self-archiving and depositing papers into repositories. Gold OA is fully author-funded, and the final version of the article is freely available to the public. Rachel Burley, Publishing Director at Springer Nature, spoke next and gave insight into European OA practices. The overall trend in Europe is to go fully OA. Success is possible via collaboration of backers, chiefly governments and academic institutions, via policies and funding. Speaking last was Richard Wilder from the Bill & Melinda Gates Foundation. The Gates Foundation attempts to ensure “Global Access” to the results of their projects. To this end, they want to make their funded research available and accessible at an affordable price, both to developing countries and to U.S. educational facilities and public libraries.
Next, we eagerly attended the session, “Rise of the Machines!” — the name itself enticed our inner nerds. The first step in machine learning is to disambiguate terms/steps/entities in publishing. Then, you link the entities into a network of meaningful relationships. Once this is in place, machine learning can perform literature mining, which can be used to recommend articles on a given subject. It can also help connect the outcome of research (such as a developed drug) to an appropriate audience (such as patients). A pyramid illustrates the overall goal of machine learning, with the largest piece being content (text, books, journals, etc.) at the bottom, data in the middle, and knowledge at the top. The goal is to take this massive amount of information (data + content) and distill it into knowledge. One weakness is that it can amplify biases. If data are biased, the machine learning system won’t correct it, but will work off of it —and thus the biases become part of the process. Peer review and reproducibility offer safeguards against weaknesses such as these. The session ended with a great audience question: Can machine- and algorithm-generated content be copyrighted? It is a complicated legal question that unfortunately does not yet have an answer.
The following day, a short discussion on data-driven decision-making helped us to understand the application of data across different groups within scholarly publishing: publishers use data to develop products and services to support research; researchers demand and use data to determine their next project or decide which direction to take; and librarians rapidly evolve not only to provide content, but to create platforms, curate data, and assist researchers with their publication choices. The overall takeaway was that data does not make decisions for you; you still need people to help you make decisions.
And, as the saying goes, the best was saved for last. The final panel discussion, “The Innovators,” highlighted the best newcomers to the scholarly publishing industry. Two subjects piqued our interest the most: Pierre Montagano from Code Ocean explained that data are not useful unless there is a way to analyze and reproduce it. With an increase in code being used for analysis and figure creation, it is imperative to fit code into publishing. Code Ocean provides a reproducibility database in which researchers can easily access another researcher’s code. It incorporates a link into a published article (e.g., within a figure) that will take you directly to Code Ocean’s database to access the same code used in the published article. The other noteworthy innovator was Gadget Software. As Product Manager, Michelle Sereno discussed their newest application, Gadget One. It aims to bring journals and books (and other publishing) into the mobile/digital era. She explained how PDF files are intended to be viewed on desktop monitors and printed paper, not multiple screen sizes and shapes. Increasingly, users prefer to download apps to stream content on their devices. So how can publishing fit into and keep up with this trend? Gadget One provides a solution. They “atomize” journal articles by breaking them into sections (e.g., abstract, introduction, results, etc.). The atoms can then by formatted and manipulated independently, users can search and filter by atom (e.g., searching for a keyword and filtering by abstract to only search for abstracts with that keyword), and users can create playlists. Gadget One is still in early development, but the eventual goal is for this process to be added into publishers’ submission workflow so that all journal articles can be published in this format as well as traditional formats.