Archive for the ‘scholarly communication’ Category

Making provenance pay

December 19th, 2010

Provenance, Dan Conover says, can drive the adoption of semantic technologies:

Imagine a global economy in which every piece of information is linked directly to its meaning and origin. In which queries produce answers, not expensive, time-consuming evaluation tasks. Imagine a world in which reliable, intelligent information structures give everyone an equal ability to make profitable decisions, or in many cases, profitable new information products. Imagine companies that get paid for the information they generate or collect based on its value to end users, rather than on the transitory attention it generates as it passes across a screen before disappearing into oblivion.

Now imagine copyright and intellectual property laws that give us practical ways of tracing the value of original contributions and collecting and distributing marginal payments across vast scales.

That’s the Semantic Economy.

- Dan Conover on the semantic economy (my emphasis added).
via Bora Zivkovic on Twitter

I wonder if he’s seen the W3 Provenance XG Final Report yet. Two parts are particularly relevant: the dimensions of provenance and the news aggregator scenario. Truly making provenance pay will require both Management of provenance (especially Access and Scale) and Content provenance around Attribution.

Go read the rest of what Dan Conover says about the semantic economy. Pay particular attention to the end: Dan says that he’s working on a functional spec for a Semantic Content Management System — a RDF-based middleware so easy that writers and editors will want to use it. I know you’re thinking of Drupal and of the Semantic Desktop; we’ll see how he’s differentiating: He invites further conversation.

I’m definitely going to have a closer look at his ideas: I like the way he thinks, and this isn’t the first time I’ve noticed his ideas for making Linked Data profitable.

Tags: , , , , , ,
Posted in future of publishing, information ecosystem, PhD diary, scholarly communication, semantic web | Comments (0)

The Social Semantic Web – a message for scholarly publishers

November 15th, 2010

I always appreciate how Geoffrey Bilder can manage to talk about the Social Semantic Web and the early modern print in (nearly) the same breath. See for yourself in the presentation he gave to scholarly publishers at the International Society of Managing and Technical Editors last month.

Geoff’s presentation is outlined, to a large extent, in an interview Geoff gave 18 months ago (search “key messages” to find the good bits). I hope to blog further about these, because Geoff has so many good things to say, which deserve unpacking!

I especially love the timeline from slide 159, which shows that we’re just past the incunabula age of the Internet:

The Early Modern Internet

We're still in the Early Modern era of the Internet. Compare to the history of print.

Tags: , , , , , ,
Posted in future of publishing, information ecosystem, PhD diary, scholarly communication, semantic web, social semantic web, social web | Comments (3)

Accessing genomics workflows from Word documents with GenePattern

November 14th, 2010

What if you could rerun computational experiments from within a scientific paper?

The GenePattern add-on for Word for Windows integrates reusable genomic experiment pipelines into Microsoft Word. Readers can rerun the original or modified experiments from within the document by connecting to a GenePattern server.

Rerunning a pipeline inside Word

Rerunning a pipeline inside Word

I don’t run Windows, so I took this screenshot from a video produced at the Broad Institute of MIT and Harvard, where GenePattern is developed.

Readers without Word for Windows can also access the experimental pipelines by exporting them from the document: just run a GenePatternDocumentExtractor command from a GenePattern server. The GenePattern public server was very easy to access and start using. Here’s what the GenePatternDocumentExtractor command looks like:

Running GenePatternDocumentExtractor at the GenePattern public server

Running GenePatternDocumentExtractor at the GenePattern public server

Unfortunately the jobs I ran didn’t extract any pipelines from the Institute’s sample DOC. I’ve sent in an inquiry (either I’m doing something wrong or there’s a bug, either way it’s useful). I was very impressed that I could make my jobs public, then refer to them by URL in my email, to make clear what exactly I did.

The GenePattern add-on for Word is another find from the beyondthepdf list. Its development was funded by Microsoft. See also Accessible Reproducible Research by Jill P. Mesirov (Science, 327:415, 2010). doi:10.1126/science.1179653, which describes the underlying philosophy: have a Reproducible Research System (RRS) made up of an environment for doing computational work (the Reproducible Research Environment or RRE) and an authoring environment (the Reproducible Research Publisher or RRP) which links back to the research system.

Tags: , , , , , ,
Posted in books and reading, future of publishing, information ecosystem, scholarly communication, Uncategorized | Comments (1)

Utopia Documents: pulling scientific data into the PDF for interactive exploration

November 14th, 2010

What if data were accessible within the document itself?

Utopia Documents is a free PDF viewer which recognizes certain enhanced figures, and fetches the underlying data. This allows readers to view and experiment with the tables, graphs, molecular structures, and sequences in situ.


You can download Utopia Documents for Mac and Windows to view enhanced papers, such as those published in The Semantic Biochemical Journal.

These screencasts were made from pages 9 and 10 of PDF of a paper by the Manchester-based Utopia team: T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, Dec 2009. doi:10.1042/BJ20091474.

In an interview at the Guardian, Utopia’s Phillip McDermott says:

“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.

“Semantics, loose-coupling, fingerprinting and linked-data are the key ingredients. All the data is described using ontologies, and a plug-in system allows third parties to integrate their database or tool within a few lines of script. We use fingerprinting to allow us to recognise what paper a user is reading, and to spot duplicates. All annotations are held remotely, so that wherever you view a paper, the result is the same.”

I’d still like to see a demo of the commenting functionality.

I’d also be particularly interested in the publisher perspective, about the production work that goes into creating the enhancements. Portland Press’s October news announces that they’ve been promoting Utopia at the Charleston conference and SSP, with an upcoming appearance at the STM Innovations Seminar.

Utopia came to my attention via Steve Pettifer’s mention.

Tags: , , , , , , , , ,
Posted in future of publishing, information ecosystem, library and information science, scholarly communication, semantic web, social semantic web | Comments (4)

A Model-View-Controller perspective of scholarly articles

November 13th, 2010

A scholarly paper is not a PDF. A PDF is merely one view of a scholarly paper. To push ‘beyond the PDF’, we need design patterns that allow us to segregate the user interface of the paper (whether it is displayed as an aggregation of triples, a list of assertions, a PDF, an ePub, HTML, …) from the thing itself.

Towards this end, Steve Pettifer has a Model-View-Controller perspective on scholarly articles, which he shared in a post on the Beyond the PDF listserv, where discussions are leading up to a workshop in January. I am awe-struck: I wish I’d thought of this way of separating the structure and explaining it.

I think a lot of the disagreement about the role of the PDF can be put down to trying to overload its function: to try to imbue it with the qualities of both ‘model’ and ‘view’. … One of the things that software architects (and I suspect designers in general) have learned over the years is that if you try to give something functions that it shouldn’t have, you end up with a mess; if you can separate out the concerns, you get a much more elegant and robust solution.

My personal take on this is that we should keep these things very separate, and that if we do this, then many of the problems we’ve been discussing become more clearly defined (and I hope, many of the apparent contradictions, resolved).

So… a PDF (or come to that, an e-book version or a html page) is merely a *view* of an article. The article itself (the ‘model’) is a completely different (and perhaps more abstract) thing. Views can be tailored for a particular purpose, whether that’s for machine processing, human reading, human browsing, etc etc.

[paragraph break inserted]

The relationship between the views and their underlying model is managed by the concept of a ‘controller’. For example, if we represent an article’s model in XML or RDF (its text, illustrations, association nanopublications, annotations and whatever else we like), then that model can be transformed in to any number of views. In the case of converting XML into human-readable XHTML, there are many stable and mature technologies (XSLT etc). In the case of doing the same with PDF, the traditional controller is something that generates PDFs.

[paragraph break inserted]

The thing that’s been (somewhat) lacking so far is the two-way communication between view and model (via controller) that’s necessary to prevent the views from ossifying and becoming out of date (i.e. there’s no easy way to see that comments have been added to the HTML version of an article’s view if you happen to be reading the PDF version, so the view here can rapidly diverge from its underlying model).

[paragraph break inserted, link added]

Our Utopia software is an attempt to provide this two-way controller for PDFs. I believe that once you have this bidirectional relationship between view and model, then the actual detailed affordances of the individual views (i.e. what can a PDF do well / badly, what can HTML do well / badly) become less important. They are all merely means to channeling the content of an article to its destination (whether that’s human or machine).

The good thing about having this ‘model view controller’ take on the problem is that only the model needs to be pinned down completely …

Perhaps separating out our concerns in this way — that is, treating the PDF as one possible representation of an article — might help focus our criticisms of the current state of affairs? I fear at the moment we are conflating the issues to some degree.

- Steve Pettifer in a Beyond the PDF listserv post

I’m particularly interested in hearing if this perspective, using the MVC model, makes sense to others.

Tags: , , , , , , ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, scholarly communication, social semantic web | Comments (9)

CiTO in the wild

October 18th, 2010

CiTO has escaped the lab and can now be used either directly in the CiteULike interface or with CiteULike machine tags. Go Citation Typing Ontology!

In the CiteULike Interface

To add a CiTO relationship between articles using the CiteULike interface, both articles must be in your own library. You’ll see a a “Citations (CiTO)” section after your tags. Click on edit and set the current article as the target.

set the CiTO target

First set the CiTO target

Then navigate around your own library to find a related article. Now you can add a CiTO tag.

Adding a CiTO tag in CiteULike

Adding a CiTO tag in CiteULike

There are a lot of choices. Choose just one. :)

CiTO Object Properties appear in the dropdown

CiTO Object Properties now appear in the dropdown

Congratulations, you’ve added a CiTO relationship! Now mousing over the CiTO section will show details on the related article.

CiTO result

Mouse over the resulting CiTO tag to get details of the related article

Machine Tags

Machine tags take fewer clicks but a little more know-how. They can be added just like any other tag, as long as you know the secret formula: cito--(insert a CiTO Object Property here from this list)--(insert article permalink numbers here) Here are two more concrete examples.

First, we can keep a list of articles citing a paper. For example, tagging an article

cito--cites--1375511

says “this article CiTO:cites article 137511″. Article 137511 can be found at http://www.citeulike.org/article/137511, aka JChemPaint – Using the Collaborative Forces of the Internet to Develop a Free Editor for 2D Chemical Structures. Then we can get the list of (hand-tagged) citations to the article. Look—a community generated reverse citation index!

Second, we can indicate specific relationships between articles, whether or not they cite each other. For example, tagging an article

cito--usesmethodin--423382

says “this item CiTO:usesmethodin item 42338″. Item 42338 is found at http://www.citeulike.org/article/423382, aka The Chemistry Development Kit (CDK):  An Open-Source Java Library for Chemo- and Bioinformatics.

Upshot

Automation and improved annotation interfaces will make CiTO more useful. CiTO:cites and CiTO:isCitedBy could used to mark up existing relationships in digital libraries such as ACM Digital Library and CiteSeer, and could enhance collections like Google Books and Mendeley, to make human navigation and automated use easier. To capture more sophisticated relationships, David Shotton has hopes of authors marking up citations before submitting papers; if it’s required, anything is possible. Data curators and article commentators may observe contradictions between papers, or methodology reuses; in these cases CiTO could be layered with an annotation ontology such as AO in order to make the provenance of such assertions clear.

CiTO could put pressure on existing publishers and information providers to improve their data services, perform more data cleanup, or to exposing bibliographies in open formats. Improved tools will be needed, as well as communities that are willing to add data by hand, and algorithms for inferring deep citation relationships.

One remaining challenge is aggregation of CiTO relationships between bibliographic data providers; article identifiers such as DOI are unfortunately not universal, and the bibliographic environment is messy, with many types of items, from books to theses to white papers to articles to reports. CiTO and related ontologies will help explicitly show the bibliographic web and relationships between these items, on the web of (meta)data.

Further Details

CiTO is part of an ecosystem of citations called Semantic Publishing and Referencing Ontologies (SPAR); see also the JISC Open Citation Project which is taking bibliographic data to the Web, and the JISC Open Bibliography Project. For those familiar with Shotton’s earlier writing on CiTO, note that SPAR breaks out some parts of the earlier formulation of this ontology.

Tags: , , , , ,
Posted in argumentative discussions, books and reading, information ecosystem, library and information science, PhD diary, scholarly communication, semantic web | Comments (3)

Quoted in Inside Higher Ed

July 17th, 2010

Earlier this week, Inside Higher Ed published an article about wikis in higher education. I’m quoted in connection with my work1 with AcaWiki, which gathers summaries of research papers, books, etc.

The article was publicized with a tweet asking “Why haven’t #wikis revolutionized scholarship?

Of course, I’d rather ask “how have wikis impacted scholarship?” — though that’s less sexy! First, the largest impact is in technological infrastructure: it’s now commonplace to use collaborative, networked tools with built-in version control. (Though “wiki” isn’t what we’d use to describe Google Docs nor Etherpad or its many clones). Second, wikis are ubiquitous in research, if you look in the right places. (nLab, OpenWetWare, and numerous departmental wikis). Third, “revolutions” take time, and academia is essentially conservative and slow-moving. For instance, ejournals (~15 years old and counting) are only just starting to depart significantly from the paper form (with multimedia inclusions, storage of data and other, public comments, overlay  journals, post-publication peer-review, etc). Wikis have been used for teaching since roughly 20022, meaning that academic wikis might be only about 8 years old at this point.

Other responses: Viva la wiki, says Brian Lamb, who was also interviewed for the article. Daniel Mietchen thinks big about the future of wikis for science.

.

  1. I used to be AcaWiki’s Community Liaison and now contribute summaries and help administer the wiki. []
  2. see e.g. Bergin, J. (2002). Teaching on the wiki web. In Proceedings of the 7th annual conference on Innovation and technology in computer science education (pp. 195-195). Aarhus, Denmark: ACM. doi:10.1145/544414.544473 and related source code []

Tags: , ,
Posted in future of publishing, higher education, information ecosystem, scholarly communication | Comments (0)

DERI “Research Explained” video series

July 15th, 2010

Word has gotten out about DERI’s “Research Explained” video series, which I’m narrating. These videos explain DERI’s Semantic Web research to a broad audience, so far in three areas: mobile/social sensing, expert finding, and semantic search.

James Lyng, Julie Letierce, Brendan Smith, and Dr. Brian Wall produce these videos with in collaboration with DERI scientists. Drawings are by Eoghan Hynes and James Lyng.

screenshot from "Semantic Search Explained" at YouTube

Watch the series at DERI Galway’s youtube video channel.

My voiceover role came thanks to Julie’s instigation, since I had narrated a screencast for our colleague Peyman Nasirifard’s Conterprise project.

Tags: , , ,
Posted in scholarly communication, semantic web | Comments (0)

Amplify your conference with an iPhone app

March 26th, 2010

via Gene Golovchinsky, I learned of an iphone app for CHI2010. What a great way to amplify the conference! Thanks to Justin Weisz and the rest of the CMU crew.

I was happy to browse the proceedings while lounging. The papers I mark show up in my personal schedule and in a reading list.

Paper viewPersonalized conference schedule, generated from my selections
I think it’s an attractive alternative to making a paper list by hand, using some conferences’ clunky online scheduling tool, or circling events in large conference handouts. If you keep an iPhone/iPod in your pocket, the app could be used during the conference, but I might also want to print out my sessions on an index card. So exporting the list would be a good enhancement: in addition to printing, I’d like to send the list of readings directly to Zotero (or another bibliographic manager).

The advance program embedded on the conference website still has some advantages: it’s easier to find out more about session types (e.g. alt.chi). Courses and workshops stand out online, too.

map of conference locationssearching the proceedings

Wayfinding is hard in on-screen PDFs, so I hope that in the long run scholarly proceedings become more screen-friendly. While at present I find an iPhone appealing for reading fiction, on-screen scholarly reading is harder: for one thing, it’s not linear.

I’d like to see integrated, reader-friendly environments for conference proceedings, with full-text papers. I envision moving seamlessly between the proceedings and an offline reading environment. Publishers can already support offline reading on a wide variety of smartphones: the HTML5-based Ibis Reader uses ePub, a standard based on xHTML and CSS. There’s no getting around the download step, but an integrated environment can be “download first, choose later”. I’ve never had much luck with CD-ROM and USB-based conference proceedings, except in pulling off 2-3 PDFs of papers to read later.

Tags: , , , ,
Posted in future of publishing, information ecosystem, iOS: iPad, iPhone, etc., scholarly communication | Comments (0)

Code4Lib Journal: A Reminisce

March 23rd, 2010

The Code4Lib Journal published issue 9 today. It’s a bittersweet day for me, because today also marks the end of my editorship on the Journal. I helped found the Journal, thinking when I signed on that I could just do a little copyediting. Along the way, I’ve taken a turn at many tasks (regrettably, I postponed taking a turn at Coordinating Editor too long).

The Journal published issue 1 in December 2007, but work started in April that year. From the beginning, Jonathan Rochkind served as a moving force. His post “Code4Lib journal idea revival?1 generated a number of responses, in part because he made it sound so easy:

So pretty much all we would need is:

1) An editorial committee or whatever. [Maybe some people imagined some
more 'revolutionary' egalitarian type of community process, but I figure
keep it simple, and an editorial committee seems simple, and also
provides some people who have explicitly taken responsibility for
getting things done.]
2) A place to host it. [maybe some kind of "institutional repository"
software would be cool, but in a pinch seems to me a WordPress
installation would do. Keep things simple and do-able and good enough is
my motto. I'm sure one of our institutions would donate server
space/cycles for a WordPress installation for such a journal. ]
3) Maybe a wiki would be nice for editorial commitee discussions.
4) Maybe a simple one page description of the mission of the journal and
what the journal is looking for in articles. The editorial committee can
work on that on the hypothetical wiki.
5) Some articles. The editorial committee can solicit some for the first
‘issue’.

Step 6: Profit! I mean, some e-published articles. No profit, sorry.

After that post, 10 of us stepped forward to decide how to get the Journal off the ground. It surprised me how easy some things were: hosting (thanks ibiblio!), getting an ISSN, finding a sysadmin (the incomparable Jonathan Brinley)…

I spoke at Code4Lib2008, my first Code4Lib conference, due to Jonathan Brinley’s interest in sharing our publishing methods and Jonathan Rochkind’s encouragement. While we looked at other systems, we chose WordPress as a platform, for its simplicity and its customizability. Jonathan Brinley had put in a proposal to Code4Lib2008 to talk about the Journal’s customizations2 He graciously shared the podium with me and Ed Corrado to co-present “The Making of the Code4Lib Journal

Since then, the Journal has gone CC-BY (thanks to DOAJ’s prodding and to qualify for the SPARC Europe Seal for Open Access Journals) and agreed to indexing in EBSCO. We’ve published numerous articles (73 + 9 editorials, if I’ve got the count right), from authors on at least 3 continents. All in all, a great first couple years!

While I’m sad to be leaving the Journal, I’m delighted to have been a part of it. A strong Editorial Committee, with new blood in the form of 5 new editors, makes it easier to pull back from this project. As Tom Keays said when introducing issue 7: Code4Lib Journal, Long May You Run!

  1. April 11, 2007 to Code4Lib listserv []
  2. The customizations are documented on the Code4Lib wiki, part of a category about the Code4Lib Journal. []

Tags: ,
Posted in future of publishing, scholarly communication | Comments (1)