» information ecosystem

Archive for the ‘information ecosystem’ Category

QOTD: “the handling and citing of the scientific literature is not an afterthought” (and my recent work on citation quotation error)

September 12th, 2025

the handling and citing of the scientific literature is not an afterthought or side activity in the conduct of science. The ability to do so rigorously and with integrity is, after all, why the National Library of Medicine exists.

Unfortunately, we don’t treat the handling of the literature as having the same prestige or impact as the execution and description of experiments or the disclosure of novel research findings. That is a fundamental problem that we are all paying the price for.

When politicians dishonestly describe the literature, we react passionately as we should. And we’re pretty busy doing that right now.

But when we describe our own papers and exaggerate the findings, we are doing the same thing. Maybe with less deleterious impact, but it can be used against us and creates confusion.

Even more concerning, a recent study shows that 17% of citations in the general literature misrepresent the findings in the cited paper. Again, many of these errors are likely not that consequential, but the sloppiness is not a good look, and potentially much worse.

– Holden Thorp, in his acceptance speech for the Donald A. B. Lindberg Award for Distinguished Health Communications.

I have the pleasure of serving with Holden on the National Academy of Sciences consensus study on Corrections and Retractions: Upgrading the Scientific Record.

Here’s the meta-analysis that was not yet out when Jeffrey Brainard’s ScienceInsider piece highlighted other recent work on quotation accuracy: Baethge, C., Jergas, H. Systematic review and meta-analysis of quotation inaccuracy in medicine. Res Integr Peer Rev 10, 13 (2025). https://doi.org/10.1186/s41073-025-00173-z

Quotation errors are an ongoing area in my research, most recently discussed last week at the 2025 Peer Review Congress, where first author M. Janina Sarol presented our podium presentation, “Leveraging Large Language Models for detecting citation quotation errors in medical literature.”

This builds on our journal article: M. Janina Sarol, Shufan Ming, Shruthan Radhakrishna, Jodi Schneider, and Halil Kilicoglu. “Assessing citation integrity in biomedical publications: Corpus annotation and NLP models”. Bioinformatics. https://doi.org/10.1093/bioinformatics/btae420

Multiple sources have funded this work with particularly instrumental funding for our citation quotation error funding coming from U.S. Office of Research Integrity ORIIR220073 to my collaborator Halil Kilicoglu.

Tags: citation, information disorder, information disorder in science, quotation error, scholarly publishing
Posted in information ecosystem, scholarly communication | Comments (0)

In MedPage Today – Retract Now: Negating Flawed Research Must Be Quicker

June 19th, 2024

Check my latest piece, Retract Now: Negating Flawed Research Must Be Quicker — Incentives and streamlined processes can prevent the spread of incorrect science in “Second Opinions”, the editorial section of MedPage Today.

I argue that

“It is urgent to be faster and more responsive in retracting publications.”
Retract Now: Negating Flawed Research Must Be Quicker Jodi Schneider in MedPage Today

Thanks to The OpEd Project, the Illinois’ Public Voices Fellowship, and my coach Michele Weldon (whose newest book is out in July). Editorial writing is part of my NSF CAREER: Using Network Analysis to Assess Confidence in Research Synthesis. The Alfred P. Sloan Foundation funds my retraction research in Reducing the Inadvertent Spread of Retracted Science including the NISO Communication of Retractions, Removals, and Expressions of Concern (CREC) Working Group.

Tags: editorials, Public Voices Fellowship, retraction, scholarly publishing, The OpEd Project
Posted in future of publishing, information ecosystem, library and information science, scholarly communication | Comments (0)

A Retraction Notice Not Retrieved: Wrong DOI

February 25th, 2024

Part 2 of an occasional series on the Empirical Retraction Lit bibliography

Our systematic search for the Empirical Retraction Lit bibliography EXCLUDES retraction notices or retracted publications using database filters. Still, some turn up. (Isn’t there always a metadata mess?)

While most retraction notices and retracted publications can be excluded at the title screening stage, a few make it through to the abstract screening, and, for items with no abstracts, to the full-text screening. Today’s example is “Retraction of unreliable publication“. Kept at the title-screening stage**; no abstract; so it’s part of the full-text screening. PubMed metadata would have told us it’s a “Retraction of Publication” – but this particular record came from Scopus.

The Zotero-provisioned article, “Clinical guidelines: too much of a good thing“, had nothing to do with retraction so I went back to the record (which had this link with the Scopus EID). To see what went wrong, I searched Scopus for EID(2-s2.0-84897800625) which finds the Scopus record, complete with an incorrect DOI: 10.1308/xxx which today takes me to a third article with another DOI.***

Scopus Preview is even more interesting because it shows the EMTREE terms “note” and “retracted article” (which are not so accurate in my opinion):

In my 2020 Scientometrics article, I cataloged challenges in getting to the full-text retraction notice for a single article. It’s not clear how common such errors are, nor how to systematically check for errors.

I’m continuing to think about this, since, for RISRS II, I’m on the lookout for metadata disasters (in research-ese: What are the implications of specific instances of successes and failures in the metadata pipeline, for designing consensus practices?)

This particular retrieval error is due to the wrong DOI – which could affect any article (not just retraction notices). I’ve reported the DOI error to the Scopus document correction team.

It’s helpful that working on the Empirical Retraction Lit bibliography surfaces anomalous situations.

**Keeping “Retraction of unreliable publication” for abstract screening may seem overgenerous. But consider the title “Retractions”. Surely “Retractions” is the title of a bulk retraction notice! Nope, it’s a research article in the Review of Economics and Statistics by Azoulay, Furman, Krieger, and Murray. Thanks, folks. While plurals are more likely than singulars to signal research articles and editorials I try to keep vague/ambiguous titles for a closer look.

***For 10.1308/xxx Crossref just lists this latest article. Same with Scopus.

But my university library system has multiple results – a mystery!

Tags: DOI, metadata disasters, metadata errors, metadata mess, retraction notices, Retraction of Publication, Scopus
Posted in Empirical Retraction Lit, information ecosystem, library and information science, scholarly communication | Comments (0)

Today in The Hill: Science is littered with zombie studies. Here’s how to stop their spread.

November 26th, 2023

My newest piece is in The Hill today: Science is littered with zombie studies. Here’s how to stop their spread.

Many people think of science as complete and objective. But the truth is, science continues to evolve and is full of mistakes. Since 1980, more than 40,000 scientific publications have been retracted. They either contained errors, were based on outdated knowledge or were outright frauds.

Identifying these inaccuracies is how science is supposed to work. …Yet these zombie publications continue to be cited and used, unwittingly, to support new arguments.

Why? Almost always it’s because nobody noticed they had been retracted.
Science is littered with zombie studies. Here’s how to stop their spread. Jodi Schneider in The Hill

Thanks to The OpEd Project, the Illinois’ Public Voices Fellowship, and my coach Luis Carrasco. Editorial writing is part of my NSF CAREER: Using Network Analysis to Assess Confidence in Research Synthesis. The Alfred P. Sloan Foundation funds my retraction research in Reducing the Inadvertent Spread of Retracted Science including the NISO Communication of Retractions, Removals, and Expressions of Concern (CREC) Working Group.

Tags: opinion pieces, retraction, RISRS, The Hill
Posted in information ecosystem, scholarly communication | Comments (0)

Last call for public comments: NISO RP-45-202X, Communication of Retractions, Removals, and Expressions of Concern

November 26th, 2023

I’m pleased that the draft Recommended Practice, NISO RP-45-202X, Communication of Retractions, Removals, and Expressions of Concern (CREC) is open for public comment through December 2, 2023. I’m a member of the NISO Working Group which is funded in part by the Alfred P. Sloan Foundation in collaboration with my Reducing the Inadvertent Spread of Retracted Science project.

The NISO CREC Recommended Practice will address the dissemination of retraction information (metadata & display) to support a consistent, timely transmission of that information to the reader (machine or human), directly or through citing publications, addressing requirements both of the retracted publication and of the retraction notice or expression of concern. It will not address the questions of what a retraction is or why an object is retracted.
NISO CREC

Tags: CREC, NISO, NISO Recommended Practice, retraction, RISRS, standards
Posted in future of publishing, information ecosystem, Information Quality Lab news, library and information science, scholarly communication | Comments (0)

What can two-way communication between scientists and citizens enable?

September 24th, 2023

The Washington Post quoted NIH researcher Paul Hwang: “Amazing findings in medicine are sometimes based on one patient”.

The findings here are a breakthrough discovery in a disease called ME/CFS – commonly known as chronic fatigue syndrome or myalgic encephalomyelitis – which led to a recent PNAS paper. This is an amazing moment: Without biomarkers, it’s been a contested disease “you have to fight to get”.

What really strikes me, though, is the individual interactions that created a space for knowledge production: an email from one citizen (Amanda Twinam) to one scientist (Paul Hwang); “serendipitous correspondence” from another scientist (Brian Walitt) with access to “an entire population” (9 of the 14 tested for the PNAS paper were similar to Amanda). Reading the literature, writing well-timed correspondence, and “hearing about” synergistic work going on in another lab all seem to have contributed.

Mady Hornig, a researcher not involved in the project, told the reporter: “It’s not very common that we do all of these … steps, having doctors who are really persistent about what is happening with one individual and applying a scientific lens.”

But what if we did?

Dumit, Joseph (2006). Illnesses you have to fight to get: Facts as forces in uncertain, emergent illnesses. Social Science & Medicine, 62(3), 577–590. https://doi.org/10.1016/j.socscimed.2005.06.018

Wang, Ping-yuan, Ma, Jin, Kim, Young-Chae, Son, Annie Y., Syed, Abu Mohammad, Liu, Chengyu, Mori, Mateus P., Huffstutler, Rebecca D., Stolinski, JoEllyn L., Talagala, S. Lalith, Kang, Ju-Gyeong, Walitt, Brian T., Nath, Avindra, & Hwang, Paul M. (2023). WASF3 disrupts mitochondrial respiration and may mediate exercise intolerance in myalgic encephalomyelitis/chronic fatigue syndrome. Proceedings of the National Academy of Sciences, 120(34), e2302738120. https://doi.org/10.1073/pnas.2302738120

Vastag, Brian (2023, September 19). She wrote to a scientist about her fatigue. It inspired a breakthrough. Washington Post. https://www.washingtonpost.com/health/2023/09/17/fatigue-cfs-longcovid-mitochondria/ Temporarily open to read via this gift link.

Tags: illnesses you have to fight to get, ME/CFS, PNAS, two-way communicaton of science, Washington Post
Posted in information ecosystem, random thoughts, scholarly communication | Comments (0)

Graduate Hourly Position: Metadata Quality Investigation

September 19th, 2022

Start Date: ASAP

Descriptions, Responsibilities, and Qualifications
This project offers an excellent opportunity for a University of Illinois Urbana-Champaign MSLIS student interested in metadata, data quality, database search, information retrieval and related topics. The incumbent will collect information about how well databases track retracted information, under the mentorship of Dr. Jodi Schneider, Assistant Professor and Director of the Information Quality Lab. The project will produce data analyses and reports to support a NISO Working Group in information gathering about how to improve metadata quality and display standards for retracted publications, in the Alfred P. Sloan foundation grant “Reducing the Inadvertent Spread of Retracted Science II: Research and Development towards the Communication of Retractions, Removals, and Expressions of Concern Recommended Practice”.

We will first search multidisciplinary databases (Scopus and Web of Science) as well as other sources (e.g., Crossref, Retraction Watch) for retracted publications. Then, we will compile a list of known retracted publications across these sources. We will compare across sources to identify retracted publications that have inconsistent information about whether or not they are retracted. We will also calculate which percentage of retracted publications indexed in the source are correctly indexed as retracted. We will then investigate how retractions are indexed in specific domain databases, using established retraction type indexing in biomedicine (PubMed, PubMed Europe)and psychology (PsycINFO), and investigating how retraced publications are indexed in chemistry (CAS SciFinder) and engineering (IEEE Xplore). We will also manually check indexing on a small dataset in search engines such as Google Scholar and Semantic Scholar.

Duties include:

Searching databases
Collating publication data
Deduplicating publication data
Documenting all aspects of the projects
Producing project memos and reports

Required Qualification:

Enrollment in the Master’s in Library and Information Science program at the University of Illinois at Urbana-Champaign
Interest in topics such as metadata, data quality, database search, etc.
Interest in quantitative research using publications as data
Detail orientation
Excellent communication skills in written and spoken English

Preferred Qualifications:

Available for continued work in spring 2023
Project management experience
Experience with quantitative data
Experience in database searching
Experience manipulating data using spreadsheet software (e.g., Excel) and/or scripting languages (e.g., R or Python)
Interest in reproducibility and open science
Interest or experience in writing research reports and/or publications

Compensation: paid as a graduate hourly through the University of Illinois, $20/hour for 10-15 hours a week.

Application Procedures: Interested candidates should send a cover letter and resume to Dr. Jodi Schneider in a single PDF file named Lastname-metadata-hourly.pdf to jodi@illinois.edu

Review of applications will begin immediately. Applications will be accepted until the position is filled. All applications received by Sunday October 2, 2022, will receive full consideration.

Posted on Handshake and on the iSchool website

Tags: CREC, data quality, database search, information retrieval, metadata, metadata quality, retraction, RISRS II
Posted in information ecosystem, Information Quality Lab news, scholarly communication | Comments (0)

Postdoctoral Research Associate (Information Quality Lab) in the School of Information Sciences, University of Illinois at Urbana-Champaign

June 29th, 2022

Applications will be reviewed on a rolling basis each Monday until the position is filled, with first review on Tuesday July 5, 2022.

The School of Information Sciences (iSchool) at the University of Illinois at Urbana-Champaign seeks a Postdoctoral Research Associate, mentored by Dr. Jodi Schneider, to contribute to research projects in the Information Quality Lab in the areas of scientometrics, argumentation, and scholarly communication.

Work arrangements

Remote work within the United States or on-site/hybrid in Champaign, Illinois.
Two-year appointment with the possibility of renewal.
Salary commensurate with experience. University health insurance benefits.
Professional development funding for research-related expenses.

Information Quality Lab: Expected Work
As part of the iSchool’s Postdoctoral Research Associate Program, the selected applicant will receive mentoring and community support to prepare for permanent appointments both inside and outside of academia. The selected applicant will work with Dr. Schneider to create an individual development plan.
The selected applicant will lead, conduct, and publish research, in an interdisciplinary information science research environment. Research contributes to two sponsored projects:

Scientometrics supporting metadata/display standards development in scholarly publishing

To what extent is authoritative information on retracted papers consistently available and accessible in a variety of field-specific and multidisciplinary databases?
How does retraction of code and datasets impact related publications and what are the legal, social, and ethical ramifications?
What are the implications of specific instances of successes and failures in the metadata pipeline, for designing consensus practices?

Document analysis to understand how journalists, activists, Wikipedia editors, and other knowledge brokers assess info on COVID-19, climate change, and artificial intelligence & labor

Construct corpora of news, social media posts, etc. quoting scientific products
Lead argumentation and framing-focused document analysis
In collaboration with the PI and PhD student Research Assistant:
- Create an information behavior model combining interview and document analysis
- Collaborate with public libraries to develop a toolkit of services for public libraries
- Design a scale-up project

Required Qualifications

A PhD in any field (including, but not limited to, informatics, information sciences, library & information science, digital humanities, or computational social sciences, e.g., communications, anthropology, etc.).
Research interest in one or more of the following: scientometrics, document analysis, altmetrics, argumentation analysis, science of science, scholarly communication, academic publishing, public policy, public understanding of science, argumentation mining, and text mining.
Interest in interdisciplinary research.
Excellent critical thinking, written and spoken English, and project management skills.

Preferred Qualifications:

Evidence of interdisciplinary research through scholarly publications or translational / implementation science projects.
Interest or experience in data-intensive and/or mixed methods research.
Publications in areas related to the research.
Interest or experience mentoring undergraduate and graduate student research.
Experience in working with and interested in working with a diverse group of students, faculty, and staff and the ability to contribute to an inclusive climate.

Application Materials – send by email to Dr. Jodi Schneider jodi@illinois.edu:

Current CV
Short statement of interest

Questions about the position can also be sent to Dr. Jodi Schneider at jodi@illinois.edu.

Official ad on the iSchool website

Tags: activists, altmetrics, argumentation, argumentation analysis, argumentation mining, computational social science, digital humanities, document analysis, informatics, jobs, journalists, knowledge brokers, library and information sciences, postdoctoral research, public libraries, public policy, public understanding of science, retraction, scholarly communication, scholarly publishing, scientometrics, Wikipedia editors
Posted in information ecosystem, Information Quality Lab news, scholarly communication | Comments (0)

Knowledge Graphs: An Aggregation of Definitions

March 3rd, 2019

I am not aware of a consensus definition of knowledge graph. I’ve been discussing this for awhile with Liliana Giusti Serra, and the topic came up again with my fellow organizers of the knowledge graph session at US2TS as we prepare for a panel.

I’ve proposed the following main features:

RDF-compatible, has a defined schema (usually an OWL ontology)
items are linked internally
may be a private enterprise dataset (e.g. not necessarily openly available for external linking) or publicly available
covers one or more domains

Below are some quotes.

I’d be curious to hear of other definitions, especially if you think there’s a consensus definition I’m just not aware of.

“A knowledge graph consists of a set of interconnected typed entities and their attributes.”

Jose Manuel Gomez-Perez, Jeff Z. Pan, Guido Vetere and Honghan Wu. “Enterprise Knowledge Graph: An Introduction.” In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6

“A knowledge graph is a structured dataset that is compatible with the RDF data model and has an (OWL) ontology as its schema. A knowledge graph is not necessarily linked to external knowledge graphs; however, entities in the knowledge graph usually have type information, defined in its ontology, which is useful for providing contextual information about such entities. Knowledge graphs are expected to be reliable, of high quality, of high accessibility and providing end user oriented information services.”

Boris Villazon-Terrazas, Nuria Garcia-Santa, Yuan Ren, Alessandro Faraotti, Honghan Wu, Yuting Zhao, Guido Vetere and Jeff Z. Pan . “Knowledge graphs: Foundations”. In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6

“The term Knowledge Graph was coined by Google in 2012, referring to their use of semantic knowledge in Web Search (“Things, not strings”), and is recently also used to refer to Semantic Web knowledge bases such as DBpedia or YAGO. From a broader perspective, any graph-based representation of some knowledge could be considered a knowledge graph (this would include any kind of RDF dataset, as well as description logic ontologies). However, there is no common definition about what a knowledge graph is and what it is not. Instead of attempting a formal definition of what a knowledge graph is, we restrict ourselves to a minimum set of characteristics of knowledge graphs, which we use to tell knowledge graphs from other collections of knowledge which we would not consider as knowledge graphs. A knowledge graph

mainly describes real world entities and their interrelations, organized in a graph.
defines possible classes and relations of entities in a schema.
allows for potentially interrelating arbitrary entities with each other.
covers various topical domains.”

Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508.

http://www.semantic-web-journal.net/system/files/swj1167.pdf

“ISI’s Center on Knowledge Graphs research group combines artificial intelligence, the semantic web, and database integration techniques to solve complex information integration problems. We leverage general research techniques across information-intensive disciplines, including medical informatics, geospatial data integration and the social Web.”

http://usc-isi-i2.github.io/home/

Just as I was “finalizing” my list to send to colleagues, I found a poster all about definitions:

Ehrlinger, L., & Wöß, W. (2016). Towards a Definition of Knowledge Graphs. SEMANTiCS (Posters, Demos, SuCCESS), 48. http://ceur-ws.org/Vol-1695/paper4.pdf

Its Table 1: Selected definitions of knowledge graph has the following definitions (for citations see that paper)

“A knowledge graph (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains.” Paulheim [16]

“Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities.” Journal of Web Semantics [12]

“Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.” Semantic Web Company [3]

“We define a Knowledge Graph as an RDF graph. An RDF graph consists of a set of RDF triples where each RDF triple (s, p, o) is an ordered set of the following RDF terms: a subjects∈U∪B,apredicatep∈U,andanobjectU∪B∪L. AnRDFtermiseithera URI u ∈ U, a blank node b ∈ B, or a literal l ∈ L.” Färber et al. [7]

“[…] systems exist, […], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph.” Pujara et al. [17]

“A knowledge graph is a graph that models semantic knowledge, where each node is a real-world concept, and each edge represents a relationship between two concepts”

Fang, Y., Kuan, K., Lin, J., Tan, C., & Chandrasekhar, V. (2017). Object detection meets knowledge graphs.
https://oar.a-star.edu.sg/jspui/handle/123456789/2147

“things not strings” – Google

https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html

Tags: knowledge graph, knowledge representation, quotations
Posted in information ecosystem, semantic web | Comments (0)

QOTD: Doing more requires thinking less

December 1st, 2018

by the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye which would otherwise call into play the higher faculties of the brain.

…Civilization advances by extending the number of important operations that we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.

One very important property for symbolism to possess is that it should be concise, so as to be visible at one glance of the eye and be rapidly written.

– Whitehead, A.N. (1911). An introduction to mathematics, Chapter 5, “The Symbolism of Mathematics” (page 61 in this version)
HT to Santiago Nuñez-Corrales (Illinois page for Santiago Nuñez-Corrales, LinkedIn for Santiago Núñez-Corrales) who used part of this quote in a Conceptual Foundations Group talk, Nov 29.

From my point of view, this is why memorizing multiplication tables is not now irrelevant; why new words for concepts are important; and underlies a lot of scientific advancement.

Tags: cavalry, modes of thought, QOTD, symbolism
Posted in information ecosystem, random thoughts | Comments (0)

« Older Entries

Recent Posts

Monthly

Meta
- Log in
- Valid XHTML
- XFN
- WordPress

jodischneider.com/blog

reading, technology, stray thoughts

Categories

Search