Archive for the ‘library and information science’ Category

Happy Public Domain Day!

January 2nd, 2011

Today, in many countries around the world, new works become public property: January 1st every year is Public Domain Day. Material in the public domain can be used, remixed and shared freely — without violating copyright and without asking permission.

However, in the United States, not a single new work entered the public domain today. Americans must wait 8 more years: Under United States copyright law, nothing more will be added to the public domain until January 1, 2019.

Until the 1970’s the maximum copyright term was 56 years. Under that law, Americans would have been able to truly celebrate Public Domain Day:

  1. All works published in 1954 would be entering the public domain today.
  2. up to 85% of all copyrighted works from 1982 would be entering the public domain today. (Copyright Office and Duke).

Instead, only works published before 1923 are conclusively in the public domain in the U.S. today. What about post-1923 publications? It’s complicated: in the United States ((609 pages worth of complicated)).

For more information on Public Domain Day and the United States, Duke’s Center for the Study of the Public Domain has a series of useful pages.

Tags: , ,
Posted in books and reading, information ecosystem, intellectual freedom, library and information science | Comments (0)

Utopia Documents: pulling scientific data into the PDF for interactive exploration

November 14th, 2010

What if data were accessible within the document itself?

Utopia Documents is a free PDF viewer which recognizes certain enhanced figures, and fetches the underlying data. This allows readers to view and experiment with the tables, graphs, molecular structures, and sequences in situ.


You can download Utopia Documents for Mac and Windows to view enhanced papers, such as those published in The Semantic Biochemical Journal.

These screencasts were made from pages 9 and 10 of PDF of a paper by the Manchester-based Utopia team: T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, Dec 2009. doi:10.1042/BJ20091474.

In an interview at the Guardian, Utopia’s Phillip McDermott says:

“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.

“Semantics, loose-coupling, fingerprinting and linked-data are the key ingredients. All the data is described using ontologies, and a plug-in system allows third parties to integrate their database or tool within a few lines of script. We use fingerprinting to allow us to recognise what paper a user is reading, and to spot duplicates. All annotations are held remotely, so that wherever you view a paper, the result is the same.”

I’d still like to see a demo of the commenting functionality.

I’d also be particularly interested in the publisher perspective, about the production work that goes into creating the enhancements. Portland Press’s October news announces that they’ve been promoting Utopia at the Charleston conference and SSP, with an upcoming appearance at the STM Innovations Seminar.

Utopia came to my attention via Steve Pettifer’s mention.

Tags: , , , , , , , , ,
Posted in future of publishing, information ecosystem, library and information science, scholarly communication, semantic web, social semantic web | Comments (4)

A Model-View-Controller perspective of scholarly articles

November 13th, 2010

A scholarly paper is not a PDF. A PDF is merely one view of a scholarly paper. To push ‘beyond the PDF’, we need design patterns that allow us to segregate the user interface of the paper (whether it is displayed as an aggregation of triples, a list of assertions, a PDF, an ePub, HTML, …) from the thing itself.

Towards this end, Steve Pettifer has a Model-View-Controller perspective on scholarly articles, which he shared in a post on the Beyond the PDF listserv, where discussions are leading up to a workshop in January. I am awe-struck: I wish I’d thought of this way of separating the structure and explaining it.

I think a lot of the disagreement about the role of the PDF can be put down to trying to overload its function: to try to imbue it with the qualities of both ‘model’ and ‘view’. … One of the things that software architects (and I suspect designers in general) have learned over the years is that if you try to give something functions that it shouldn’t have, you end up with a mess; if you can separate out the concerns, you get a much more elegant and robust solution.

My personal take on this is that we should keep these things very separate, and that if we do this, then many of the problems we’ve been discussing become more clearly defined (and I hope, many of the apparent contradictions, resolved).

So… a PDF (or come to that, an e-book version or a html page) is merely a *view* of an article. The article itself (the ‘model’) is a completely different (and perhaps more abstract) thing. Views can be tailored for a particular purpose, whether that’s for machine processing, human reading, human browsing, etc etc.

[paragraph break inserted]

The relationship between the views and their underlying model is managed by the concept of a ‘controller’. For example, if we represent an article’s model in XML or RDF (its text, illustrations, association nanopublications, annotations and whatever else we like), then that model can be transformed in to any number of views. In the case of converting XML into human-readable XHTML, there are many stable and mature technologies (XSLT etc). In the case of doing the same with PDF, the traditional controller is something that generates PDFs.

[paragraph break inserted]

The thing that’s been (somewhat) lacking so far is the two-way communication between view and model (via controller) that’s necessary to prevent the views from ossifying and becoming out of date (i.e. there’s no easy way to see that comments have been added to the HTML version of an article’s view if you happen to be reading the PDF version, so the view here can rapidly diverge from its underlying model).

[paragraph break inserted, link added]

Our Utopia software is an attempt to provide this two-way controller for PDFs. I believe that once you have this bidirectional relationship between view and model, then the actual detailed affordances of the individual views (i.e. what can a PDF do well / badly, what can HTML do well / badly) become less important. They are all merely means to channeling the content of an article to its destination (whether that’s human or machine).

The good thing about having this ‘model view controller’ take on the problem is that only the model needs to be pinned down completely …

Perhaps separating out our concerns in this way — that is, treating the PDF as one possible representation of an article — might help focus our criticisms of the current state of affairs? I fear at the moment we are conflating the issues to some degree.

– Steve Pettifer in a Beyond the PDF listserv post

I’m particularly interested in hearing if this perspective, using the MVC model, makes sense to others.

Tags: , , , , , , ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, scholarly communication, social semantic web | Comments (9)

CiTO in the wild

October 18th, 2010

CiTO has escaped the lab and can now be used either directly in the CiteULike interface or with CiteULike machine tags. Go Citation Typing Ontology!

In the CiteULike Interface

To add a CiTO relationship between articles using the CiteULike interface, both articles must be in your own library. You’ll see a a “Citations (CiTO)” section after your tags. Click on edit and set the current article as the target.

set the CiTO target

First set the CiTO target

Then navigate around your own library to find a related article. Now you can add a CiTO tag.

Adding a CiTO tag in CiteULike

Adding a CiTO tag in CiteULike

There are a lot of choices. Choose just one. :)

CiTO Object Properties appear in the dropdown

CiTO Object Properties now appear in the dropdown

Congratulations, you’ve added a CiTO relationship! Now mousing over the CiTO section will show details on the related article.

CiTO result

Mouse over the resulting CiTO tag to get details of the related article

Machine Tags

Machine tags take fewer clicks but a little more know-how. They can be added just like any other tag, as long as you know the secret formula: cito--(insert a CiTO Object Property here from this list)--(insert article permalink numbers here) Here are two more concrete examples.

First, we can keep a list of articles citing a paper. For example, tagging an article

cito--cites--1375511

says “this article CiTO:cites article 137511”. Article 137511 can be found at http://www.citeulike.org/article/137511, aka JChemPaint – Using the Collaborative Forces of the Internet to Develop a Free Editor for 2D Chemical Structures. Then we can get the list of (hand-tagged) citations to the article. Look—a community generated reverse citation index!

Second, we can indicate specific relationships between articles, whether or not they cite each other. For example, tagging an article

cito--usesmethodin--423382

says “this item CiTO:usesmethodin item 42338”. Item 42338 is found at http://www.citeulike.org/article/423382, aka The Chemistry Development Kit (CDK):  An Open-Source Java Library for Chemo- and Bioinformatics.

Upshot

Automation and improved annotation interfaces will make CiTO more useful. CiTO:cites and CiTO:isCitedBy could used to mark up existing relationships in digital libraries such as ACM Digital Library and CiteSeer, and could enhance collections like Google Books and Mendeley, to make human navigation and automated use easier. To capture more sophisticated relationships, David Shotton has hopes of authors marking up citations before submitting papers; if it’s required, anything is possible. Data curators and article commentators may observe contradictions between papers, or methodology reuses; in these cases CiTO could be layered with an annotation ontology such as AO in order to make the provenance of such assertions clear.

CiTO could put pressure on existing publishers and information providers to improve their data services, perform more data cleanup, or to exposing bibliographies in open formats. Improved tools will be needed, as well as communities that are willing to add data by hand, and algorithms for inferring deep citation relationships.

One remaining challenge is aggregation of CiTO relationships between bibliographic data providers; article identifiers such as DOI are unfortunately not universal, and the bibliographic environment is messy, with many types of items, from books to theses to white papers to articles to reports. CiTO and related ontologies will help explicitly show the bibliographic web and relationships between these items, on the web of (meta)data.

Further Details

CiTO is part of an ecosystem of citations called Semantic Publishing and Referencing Ontologies (SPAR); see also the JISC Open Citation Project which is taking bibliographic data to the Web, and the JISC Open Bibliography Project. For those familiar with Shotton’s earlier writing on CiTO, note that SPAR breaks out some parts of the earlier formulation of this ontology.

Tags: , , , , ,
Posted in argumentative discussions, books and reading, information ecosystem, library and information science, PhD diary, scholarly communication, semantic web | Comments (3)

W3C Library Linked Data Incubator Group starting

May 25th, 2010

The W3C has announced an incubator activity around Library Linked Data. I’ll be one of DERI’s participants in the group.

Its mission? To help increase global interoperability of library data on the Web, and to bring together people from archives, museums, publishing, etc. to talk about metadata. See the charter for more details.

Interested in joining? If you’re at a W3C member organization, ask your Advisory Committee Representative to appoint you. Or, get appointed as an invited expert by contacting one of the chairs (Tom Baker, Emmanuelle Bermes, Antoine Isaac); their contact info is available from the participants’ list.

Or, you can follow along on the incubator group’s public mailing list. (For organizing, the Sem lib mailing list was used.)

The first teleconference will be Thursday, 3 June at 1500 UTC.

Tags: , , ,
Posted in library and information science, PhD diary, semantic web | Comments (0)

Opening bibliographic data

February 7th, 2010

I love the CERN library’s message of “Raw bibliographic book data available now!”, framed
1989: TimBL invented WWW at CERN
2009: TimBL calls for “Open Data Now” at TED

CERN is the latest library to share their book data, as CERN emerging technologies librarian Patrick Danowski announced on twitter. The Open Book Data Project is further described on their website and in a youtube video (below) purpose-made for the occasion. The data is dual-licensed as CC0 and PDDL.

This isn’t the first time that library data has been shared with a splash.

After speaking at Code4Lib 2008 (my first Code4Lib conference), Brewster Kahle was presented with MARC records from the Oregon Summit consortium.

In 2007, a number of Library of Congress records were deposited in connection with
Scriblio Open Source Endeca, a faceted catalog Casey Bisson Durfee described at Code4Lib2007. Scriblio It has gone through several incarnations; the open source Kochief project is the latest.

Further, as Jonathan Gorman and I were discussing in #code4lib earlier this week, there are several collections of MARC records and more donated to Open Library hosted at the Internet Archive. A few are misclassified so also consider keyword searches (‘MARC’ and ‘MARC libraries’) if you’re trying to find all the MARC records that archive.org has.

Linked data in libraries is coming along more slowly; fruit, perhaps, for another post.

Where do you look for bibliographic records? Feel free to leave tips in the comments!

Updated 2010-04-14, with thanks to Dan Scott for corrections!

Tags: , , , , , , , ,
Posted in library and information science | Comments (2)

Google Books settlement: a monopoly waiting to happen

October 10th, 2009

Will Google Books create a monopoly? Some ((“Several European nations, including France and Germany, have expressed concern that the proposed settlement gives Google a monopoly in content. Since the settlement was the result of a class action against Google, it applies only to Google. Other companies would not be free to digitise books under the same terms.” (bolding mine) – Nigel Kendall, Times (UK) Online, Google Book Search: why it matters )) people think ((“Google’s five-year head start and its relationships with libraries and publishers give it an effective monopoly: No competitor will be able to come after it on the same scale. Nor is technology going to lower the cost of entry. Scanning will always be an expensive, labor-intensive project.” (bolding mine) – Geoffrey Nunberg, Chronicle of Higher Education, Google’s Book Search: A Disaster for Scholars (pardon the paywall))) so. Brin claims it won’t:

If Google Books is successful, others will follow. And they will have an easier path: this agreement creates a books rights registry that will encourage rights holders to come forward and will provide a convenient way for other projects to obtain permissions.

-Sergey Brin, New York Times, A Library To Last Forever

Brin is wrong: the proposed Google Books settlement will not smooth the way for other digitization projects. It creates a red carpet for Google while leaving everyone else at risk of copyright infringement.

The safe harbor provisions apply only to Google. Anyone else who wants to use one of these books would face the draconian penalties of statutory copyright infringement if it turned out the book was actually still copyrighted. Even with all this effort, one will not be able to say with certainty that a book is in the public domain. To do that would require a legislative change – and not a negotiated settlement.

– Peter Hirtle, LibraryLawBlog: The Google Book Settlement and the Public Domain.

Monopoly is not the only risk. Others include ((Of course there are lots of benefits, too!)) reader privacy, access to culture, suitability for bulk and some research users (metadata, etc.). Too bad Brin isn’t acknowledging that!

Don’t know what all the fuss is with Google Books and the proposed settlement? Wired has a good outline from April.

Tags: , , , ,
Posted in books and reading, future of publishing, information ecosystem, intellectual freedom, library and information science | Comments (1)

Onward and upward

September 4th, 2009

Today is my last day at Appalachian State University.

Monday I begin a new adventure as community organizer, helping launch Acawiki, a “wiki for academic research”. The brainchild of Neeru Paharia, Acawiki strives to make research papers easier to access and understand. Go write your own summary!

The next month will find me living in Massachusetts, my adult home, while preparing for a move to Ireland!

In October, I’ll be joining the Social Software Unit at DERI for a fellowship. The group does fascinating work on social software and the semantic web. This is a 3(or 4)-year Ph.D. project, where I’ll be working on modeling online discussions/arguments. More about that soon!

I’m looking for practical advice of all sorts—about community organizing, about moving to Ireland and living abroad, about success in Ph.D. studies. Consider this your personal solicitation for tips, tricks, and advice!

Tags: , , , ,
Posted in computer science, higher education, library and information science, random thoughts | Comments (6)

JCDL 2009 Poster Session in Second Life

June 18th, 2009

Last night I popped into Second Life for a poster session. JCDL 2009 is going on in Austin this week, and several of the posters were on display in the Digital Preserve region of SL. Chris Beer asked for some screenshots.

Here’s the whole poster space from outside. (Click each image for the ginormous full-size screenshot.)
Poster Session Entrance
My avatar (TR Telling) is in a bright orange UIUC GSLIS T-shirt, thanks to a class tour Richard Urban led last year. With a closer look, you can spot the screen that was used to project MinuteMadness.

Here are two posters, “Finding Centuries-Old Hyperlinks” and “Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition”.
Two Posters: "Finding Centuries-Old Hyperlinks" and "Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition"Poster numbers were used for the best poster competition, I believe.

Large text-sizes really help viewing from afar; deft users can get a closer view with ‘mouse look’. I took a second screenshot of the “Finding Centuries-Old Hyperlinks” poster since it was my favorite. Xiaoyue (Elaine) Wang and Eamonn Keogh suggest cross-referencing manuscript pages using icon similarity.
Closer View of "Finding Centuries-Old Hyperlinks"Handouts could be really useful for a SL poster session — I had to settle for taking screenshots. Clicking on the poster could give a copy of the poster, which could include links to more information. A mailbox could facilitate sending messages to the presenters.

One presenter ‘attended’ from New York. Several people are gathered around her poster, which generated a lot of discussion.
postertalk
In the left corner you can see one of the more visually striking posters, a study of LIS students’ impressions of the Kindle, after using it for something like 3 weeks.

To the right of the entrance is a sign that says “What did you think?”, which linked to a comment form to be completed on the Web. I succeeded at that box, but wasn’t able to figure out how to submit a second, in-world comment form.

My avatar is just stepping down from a rotating lazy-susan which held a striking comment box. Getting a comment form and filling it out was straightforward. However, dragging and dropping the form back onto the box, as suggested, didn’t work for me.

I had several interesting conversations, most notably a chat outside in the Poster Garden with Javier Velasco Martin who helped build and furnish the Preserve. Ed Fox was easily identifiable: his avatar’s first name is EdFox. For social gatherings, handles are useful, but for professional gatherings it can be reassuring to know who you’re talking with.

Here’s one last look at the dome from the outside. I love the bright aqua JCDL lettering. And, what trip to Second Life would be complete without some flying?
Flying by the JCDL Poster Session Dome With a closer look, you can see the large comment box in the center of the dome.

Tags: , , , ,
Posted in computer science, future of publishing, higher education, library and information science | Comments (1)