Archive for the ‘information ecosystem’ Category

QOTD: “the handling and citing of the scientific literature is not an afterthought” (and my recent work on citation quotation error)

September 12th, 2025

the handling and citing of the scientific literature is not an afterthought or side activity in the conduct of science. The ability to do so rigorously and with integrity is, after all, why the National Library of Medicine exists.

Unfortunately, we don’t treat the handling of the literature as having the same prestige or impact as the execution and description of experiments or the disclosure of novel research findings. That is a fundamental problem that we are all paying the price for.

When politicians dishonestly describe the literature, we react passionately as we should. And we’re pretty busy doing that right now.

But when we describe our own papers and exaggerate the findings, we are doing the same thing. Maybe with less deleterious impact, but it can be used against us and creates confusion.

Even more concerning, a recent study shows that 17% of citations in the general literature misrepresent the findings in the cited paper. Again, many of these errors are likely not that consequential, but the sloppiness is not a good look, and potentially much worse.

– Holden Thorp, in his acceptance speech for the Donald A. B. Lindberg Award for Distinguished Health Communications.

I have the pleasure of serving with Holden on the National Academy of Sciences consensus study on Corrections and Retractions: Upgrading the Scientific Record.

Here’s the meta-analysis that was not yet out when Jeffrey Brainard’s ScienceInsider piece highlighted other recent work on quotation accuracy: Baethge, C., Jergas, H. Systematic review and meta-analysis of quotation inaccuracy in medicineRes Integr Peer Rev 10, 13 (2025). https://doi.org/10.1186/s41073-025-00173-z

Quotation errors are an ongoing area in my research, most recently discussed last week at the 2025 Peer Review Congress, where first author M. Janina Sarol presented our podium presentation, “Leveraging Large Language Models for detecting citation quotation errors in medical literature.”

This builds on our journal article: M. Janina Sarol, Shufan Ming, Shruthan Radhakrishna, Jodi Schneider, and Halil Kilicoglu. “Assessing citation integrity in biomedical publications: Corpus annotation and NLP models”. Bioinformatics. https://doi.org/10.1093/bioinformatics/btae420

Multiple sources have funded this work with particularly instrumental funding for our citation quotation error funding coming from U.S. Office of Research Integrity ORIIR220073 to my collaborator Halil Kilicoglu.

Tags: , , , ,
Posted in information ecosystem, scholarly communication | Comments (0)

In MedPage Today – Retract Now: Negating Flawed Research Must Be Quicker

June 19th, 2024

Check my latest piece, Retract Now: Negating Flawed Research Must Be Quicker — Incentives and streamlined processes can prevent the spread of incorrect science in “Second Opinions”, the editorial section of MedPage Today.

I argue that

“It is urgent to be faster and more responsive in retracting publications.”

Retract Now: Negating Flawed Research Must Be Quicker Jodi Schneider in MedPage Today

Thanks to The OpEd Project, the Illinois’ Public Voices Fellowship, and my coach Michele Weldon (whose newest book is out in July). Editorial writing is part of my NSF CAREER: Using Network Analysis to Assess Confidence in Research Synthesis. The Alfred P. Sloan Foundation funds my retraction research in Reducing the Inadvertent Spread of Retracted Science including the NISO Communication of Retractions, Removals, and Expressions of Concern (CREC) Working Group.

Tags: , , , ,
Posted in future of publishing, information ecosystem, library and information science, scholarly communication | Comments (0)

Today in The Hill: Science is littered with zombie studies. Here’s how to stop their spread.

November 26th, 2023

My newest piece is in The Hill today: Science is littered with zombie studies. Here’s how to stop their spread.

Many people think of science as complete and objective. But the truth is, science continues to evolve and is full of mistakes. Since 1980, more than 40,000 scientific publications have been retracted. They either contained errors, were based on outdated knowledge or were outright frauds. 

Identifying these inaccuracies is how science is supposed to work. …Yet these zombie publications continue to be cited and used, unwittingly, to support new arguments. 

Why? Almost always it’s because nobody noticed they had been retracted. 

Science is littered with zombie studies. Here’s how to stop their spread. Jodi Schneider in The Hill

Thanks to The OpEd Project, the Illinois’ Public Voices Fellowship, and my coach Luis Carrasco. Editorial writing is part of my NSF CAREER: Using Network Analysis to Assess Confidence in Research Synthesis. The Alfred P. Sloan Foundation funds my retraction research in Reducing the Inadvertent Spread of Retracted Science including the NISO Communication of Retractions, Removals, and Expressions of Concern (CREC) Working Group.

Tags: , , ,
Posted in information ecosystem, scholarly communication | Comments (0)

Last call for public comments: NISO RP-45-202X, Communication of Retractions, Removals, and Expressions of Concern

November 26th, 2023

I’m pleased that the draft Recommended Practice, NISO RP-45-202X, Communication of Retractions, Removals, and Expressions of Concern (CREC) is open for public comment through December 2, 2023. I’m a member of the NISO Working Group which is funded in part by the Alfred P. Sloan Foundation in collaboration with my Reducing the Inadvertent Spread of Retracted Science project.

The NISO CREC Recommended Practice will address the dissemination of retraction information (metadata & display) to support a consistent, timely transmission of that information to the reader (machine or human), directly or through citing publications, addressing requirements both of the retracted publication and of the retraction notice or expression of concern. It will not address the questions of what a retraction is or why an object is retracted.

NISO CREC

Tags: , , , , ,
Posted in future of publishing, information ecosystem, Information Quality Lab news, library and information science, scholarly communication | Comments (0)

What can two-way communication between scientists and citizens enable?

September 24th, 2023

The Washington Post quoted NIH researcher Paul Hwang: “Amazing findings in medicine are sometimes based on one patient”.

The findings here are a breakthrough discovery in a disease called ME/CFS – commonly known as chronic fatigue syndrome or myalgic encephalomyelitis – which led to a recent PNAS paper. This is an amazing moment: Without biomarkers, it’s been a contested disease “you have to fight to get”.

What really strikes me, though, is the individual interactions that created a space for knowledge production: an email from one citizen (Amanda Twinam) to one scientist (Paul Hwang); “serendipitous correspondence” from another scientist (Brian Walitt) with access to “an entire population” (9 of the 14 tested for the PNAS paper were similar to Amanda). Reading the literature, writing well-timed correspondence, and “hearing about” synergistic work going on in another lab all seem to have contributed.

Mady Hornig, a researcher not involved in the project, told the reporter: “It’s not very common that we do all of these … steps, having doctors who are really persistent about what is happening with one individual and applying a scientific lens.”

But what if we did?


Dumit, Joseph (2006). Illnesses you have to fight to get: Facts as forces in uncertain, emergent illnesses. Social Science & Medicine, 62(3), 577–590. https://doi.org/10.1016/j.socscimed.2005.06.018

Wang, Ping-yuan, Ma, Jin, Kim, Young-Chae, Son, Annie Y., Syed, Abu Mohammad, Liu, Chengyu, Mori, Mateus P., Huffstutler, Rebecca D., Stolinski, JoEllyn L., Talagala, S. Lalith, Kang, Ju-Gyeong, Walitt, Brian T., Nath, Avindra, & Hwang, Paul M. (2023). WASF3 disrupts mitochondrial respiration and may mediate exercise intolerance in myalgic encephalomyelitis/chronic fatigue syndrome. Proceedings of the National Academy of Sciences, 120(34), e2302738120. https://doi.org/10.1073/pnas.2302738120

Vastag, Brian (2023, September 19). She wrote to a scientist about her fatigue. It inspired a breakthrough. Washington Post. https://www.washingtonpost.com/health/2023/09/17/fatigue-cfs-longcovid-mitochondria/ Temporarily open to read via this gift link.

Tags: , , , ,
Posted in information ecosystem, random thoughts, scholarly communication | Comments (0)

Knowledge Graphs: An Aggregation of Definitions

March 3rd, 2019

I am not aware of a consensus definition of knowledge graph. I’ve been discussing this for awhile with Liliana Giusti Serra, and the topic came up again with my fellow organizers of the knowledge graph session at US2TS as we prepare for a panel.

I’ve proposed the following main features:

  • RDF-compatible, has a defined schema (usually an OWL ontology)
  • items are linked internally
  • may be a private enterprise dataset (e.g. not necessarily openly available for external linking) or publicly available
  • covers one or more domains

Below are some quotes.

I’d be curious to hear of other definitions, especially if you think there’s a consensus definition I’m just not aware of.

“A knowledge graph consists of a set of interconnected typed entities and their attributes.”
Jose Manuel Gomez-Perez, Jeff Z. Pan, Guido Vetere and Honghan Wu. “Enterprise Knowledge Graph: An Introduction.”  In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6

“A knowledge graph is a structured dataset that is compatible with the RDF data model and has an (OWL) ontology as its schema. A knowledge graph is not necessarily linked to external knowledge graphs; however, entities in the knowledge graph usually have type information, defined in its ontology, which is useful for providing contextual information about such entities. Knowledge graphs are expected to be reliable, of high quality, of high accessibility and providing end user oriented information services.”

Boris Villazon-Terrazas, Nuria Garcia-Santa, Yuan Ren, Alessandro Faraotti, Honghan Wu, Yuting Zhao, Guido Vetere and Jeff Z. Pan .  “Knowledge graphs: Foundations”. In Exploiting Linked Data and Knowledge Graphs in Large Organisations.  Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6


“The term Knowledge Graph was coined by Google in 2012, referring to their use of semantic knowledge in Web Search (“Things, not strings”), and is recently also used to refer to Semantic Web knowledge bases such as DBpedia or YAGO. From a broader perspective, any graph-based representation of some knowledge could be considered a knowledge graph (this would include any kind of RDF dataset, as well as description logic ontologies). However, there is no common definition about what a knowledge graph is and what it is not. Instead of attempting a formal definition of what a knowledge graph is, we restrict ourselves to a minimum set of characteristics of knowledge graphs, which we use to tell knowledge graphs from other collections of knowledge which we would not consider as knowledge graphs. A knowledge graph

  1. mainly describes real world entities and their interrelations, organized in a graph.

  2. defines possible classes and relations of entities in a schema.

  3. allows for potentially interrelating arbitrary entities with each other.

  4. covers various topical domains.”

Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web8(3), 489-508.

“ISI’s Center on Knowledge Graphs research group combines artificial intelligence, the semantic web, and database integration techniques to solve complex information integration problems. We leverage general research techniques across information-intensive disciplines, including medical informatics, geospatial data integration and the social Web.”

Just as I was “finalizing” my list to send to colleagues, I found a poster all about definitions:
Ehrlinger, L., & Wöß, W. (2016). Towards a Definition of Knowledge Graphs. SEMANTiCS (Posters, Demos, SuCCESS)48http://ceur-ws.org/Vol-1695/paper4.pdf
Its Table 1: Selected definitions of knowledge graph has the following definitions (for citations see that paper)

“A knowledge graph (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains.” Paulheim [16]

“Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities.” Journal of Web Semantics [12]

“Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.” Semantic Web Company [3]

“We define a Knowledge Graph as an RDF graph. An RDF graph consists of a set of RDF triples where each RDF triple (s, p, o) is an ordered set of the following RDF terms: a subjects∈U∪B,apredicatep∈U,andanobjectU∪B∪L. AnRDFtermiseithera URI u ∈ U, a blank node b ∈ B, or a literal l ∈ L.” Färber et al. [7]

“[…] systems exist, […], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph.” Pujara et al. [17]


“A knowledge graph is a graph that models semantic knowledge, where each node is a real-world concept, and each edge represents a relationship between two concepts”

Fang, Y., Kuan, K., Lin, J., Tan, C., & Chandrasekhar, V. (2017). Object detection meets knowledge graphs.
https://oar.a-star.edu.sg/jspui/handle/123456789/2147


“things not strings” – Google

Tags: , ,
Posted in information ecosystem, semantic web | Comments (0)

QOTD: Doing more requires thinking less

December 1st, 2018

by the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye which would otherwise call into play the higher faculties of the brain.

…Civilization advances by extending the number of important operations that we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.

One very important property for symbolism to possess is that it should be concise, so as to be visible at one glance of the eye and be rapidly written.

– Whitehead, A.N. (1911). An introduction to mathematics, Chapter 5, “The Symbolism of Mathematics” (page 61 in this version)
HT to Santiago Nuñez-Corrales (Illinois page for Santiago Nuñez-Corrales, LinkedIn for Santiago Núñez-Corrales) who used part of this quote in a Conceptual Foundations Group talk, Nov 29.

From my point of view, this is why memorizing multiplication tables is not now irrelevant; why new words for concepts are important; and underlies a lot of scientific advancement.

Tags: , , ,
Posted in information ecosystem, random thoughts | Comments (0)