You are very warmly invited to review this paper. You can post the review as a comment to the manuscript page publicly at SWJ’s website. Informal comments by email are also welcome.
Open review
I adore SWJ’s open review process: publicly available manuscripts are useful. In 11 months the landing page has had “1208 reads” and I’m sure that not all of those are mine! Further, knowing who reviewed a paper can add credibility to the process. (It means quite a lot to me when Simon Buckingham-Shum says “I anticipate that this will become a standard reference for the field.”!)
Two earlier versions
The paper evolved from my first year Ph.D. report. In the process of defining my Ph.D. topic, I reviewed the state-of-art of argumentation for the Social Semantic Web. This was further developed in conversations with my coauthors, my colleague Tudor Groza and my advisor Alexandre Passant.
If you were building a user interface for the Web of data, for books, it just might look like Small Demons.
Unfortunately you can’t see much without logging in, so go get yourself a beta account. (I’ve already complained about asking for a birthday. My new one is 29 Feb 1904, you can help me celebrate in 2012!)
Their data on Ireland is pretty sketchy so far. They do offer to help you buy Guiness on Amazon though. :)
Thanksgiving weekend doesn’t really register in Europe. But this year it will for me: I’m going to Amsterdam for Quantified Self Europe, since I’m lucky enough to have a scholarship covering conference fees.
Today I proposed two talks:
Weight and exercise tracking (which I’ve been doing in various forms for 19 months, currently using a Phillips DirectLife exercise monitor, and a normal scale, collected with the hacker’s diet). Mainly, these are less integrated than they could be, and I’d like to advocate interoperability, APIs, and uniform formats — while hopefully getting some ideas from the audience about quick hacks to improve my current system.
Lifetracking, privacy & the surveillance society. This brings together two themes: First, how individuals’ lifetracking can be seen as a re-enactment of privacy, with changed ideas of what that means (e.g. panopticon, sousveillance, etc.). Second, the increased awareness about the wealth of personal data held by corporations (e.g. German politician Malt Spitz sued to get 6 months of his telcom data). The boundary between public life and private life is continually shifting as communication technology and social norms evolve; this talk investigates how lifetracking and the quantified self movement push the privacy/publicity boundaries in multiple ways. QS increases the public audience for data-driven stories of private lives while also highlighting the need for individuals to control access to and the disposition of their own personal data.
Ironically, self-surveillance was an academic interest of mine before it became a personal one: Back in 2009, Nathan Yau and I wrote a paper for the ASIST Bulletin about self-surveillance (PDF) [less pretty in HTML]. It helped interest me in the Semantic Web, too: putting data in standard formats would make it easier to make data-driven visualizations, so lifetracking and the quantified self movement is a great usecase for the (social) Semantic Web. QS also shows how privacy cuts both ways and could provide an early-adopter audience for the kind of fine-grained privacy tools a colleague is developing.
We’ve extended the STLR 2011 deadline due to several requests; submissions are now due May 8th.
JCDL workshops are split over two half-days, and we are lucky enough to have *two* keynote speakers: Bernhard Haslhofer of the University of Vienna and Cathy Marshall of Microsoft Research.
Consider submitting!
CALL FOR PARTICIPATION
The 1st Workshop on Semantic Web Technologies for Libraries and Readers
While Semantic Web technologies are successfully being applied to library catalogs and digital libraries, the semantic enhancement of books and other electronic media is ripe for further exploration. Connections between envisioned and emerging scholarly objects (which are doubtless social and semantic) and the digital libraries in which these items will be housed, encountered, and explored have yet to be made and implemented. Likewise, mobile reading brings new opportunities for personalized, context-aware interactions between reader and material, enriched by information such as location, time of day and access history.
This full-day workshop, motivated by the idea that reading is mobile, interactive, social, and material, will be focused on semantically enhancing electronic media as well as on the mobile and social aspects of the Semantic Web for electronic media, libraries and their users. It aims to bring together practitioners and developers involved in semantically enhancing electronic media (including documents, books, research objects, multimedia materials and digital libraries) as well as academics researching more formal aspects of the interactions between such resources and their users. We also particularly invite entrepreneurs and developers interested in enhancing electronic media using Semantic Web technologies with a user-centered approach.
We invite the submission of papers, demonstrations and posters which describe implementations or original research that are related (but are not limited) to the following areas of interest:
Strategies for semantic publishing (technical, social, and economic)
Approaches for consuming semantic representations of digital documents and electronic media
Open and shared semantic bookmarks and annotations for mobile and device-independent use
User-centered approaches for semantically annotating reading lists and/or library catalogues
Applications of Semantic Web technologies for building personal or context-aware media libraries
Approaches for interacting with context-aware electronic media (e.g. location-aware storytelling, context-sensitive mobile applications, use of geolocation, personalization, etc.)
Applications for media recommendations and filtering using Semantic Web technologies
Applications integrating natural language processing with approaches for semantic annotation of reading materials
Applications leveraging the interoperability of semantic annotations for aggregation and crowd-sourcing
Approaches for discipline-specific or task-specific information sharing and collaboration
Social semantic approaches for using, publishing, and filtering scholarly objects and personal electronic media
IMPORTANT DATES
*EXTENDED* Paper submission deadline: May 8th 2011
Acceptance notification: June 1st 2011
Camera-ready version: June 8th 2011
Please use PDF format for all submissions. Semantically annotated versions of submissions, and submissions in novel digital formats, are encouraged and will be accepted in addition to a PDF version.
All submissions must adhere to the following page limits:
Full length papers: maximum 8 pages
Demonstrations: 2 pages
Posters: 1 page
I spoke about my first year Ph.D. research in December at DERI. The topic of my talk: Wikipedia discussions and the nascent World Wide Argument Web. I was proud to have the video (below) posted to our institute video stream.
The Wikipedia research is drawn from our ACM Symposium on Applied Computing paper:
Jodi Schneider, Alexandre Passant, John G. Breslin, “Understanding and Improving Wikipedia Article Discussion Spaces.” In SAC 2011 (Web Track), TaiChung, Taiwan, March 21-25, 2011.
Yesterday I spoke at Beyond the PDF about use cases for reading. Slides are below; the presentation was also webcast, so I hope to share a video recording when it becomes available. The video is now on Youtube (part of the Beyond the PDF video playlist) and below.
I always appreciate how Geoffrey Bilder can manage to talk about the Social Semantic Web and the early modern print in (nearly) the same breath. See for yourself in the presentation he gave to scholarly publishers at the International Society of Managing and Technical Editors last month.
Geoff’s presentation is outlined, to a large extent, in an interview Geoff gave 18 months ago (search “key messages” to find the good bits). I hope to blog further about these, because Geoff has so many good things to say, which deserve unpacking!
What if data were accessible within the document itself?
Utopia Documents is a free PDF viewer which recognizes certain enhanced figures, and fetches the underlying data. This allows readers to view and experiment with the tables, graphs, molecular structures, and sequences in situ.
These screencasts were made from pages 9 and 10 of PDF of a paper by the Manchester-based Utopia team: T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, Dec 2009. doi:10.1042/BJ20091474.
“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.
“Semantics, loose-coupling, fingerprinting and linked-data are the key ingredients. All the data is described using ontologies, and a plug-in system allows third parties to integrate their database or tool within a few lines of script. We use fingerprinting to allow us to recognise what paper a user is reading, and to spot duplicates. All annotations are held remotely, so that wherever you view a paper, the result is the same.”
I’d still like to see a demo of the commenting functionality.
I’d also be particularly interested in the publisher perspective, about the production work that goes into creating the enhancements. Portland Press’s October news announces that they’ve been promoting Utopia at the Charleston conference and SSP, with an upcoming appearance at the STM Innovations Seminar.
A scholarly paper is not a PDF. A PDF is merely one view of a scholarly paper. To push ‘beyond the PDF’, we need design patterns that allow us to segregate the user interface of the paper (whether it is displayed as an aggregation of triples, a list of assertions, a PDF, an ePub, HTML, …) from the thing itself.
Towards this end, Steve Pettifer has a Model-View-Controller perspective on scholarly articles, which he shared in a post on the Beyond the PDF listserv, where discussions are leading up to a workshop in January. I am awe-struck: I wish I’d thought of this way of separating the structure and explaining it.
I think a lot of the disagreement about the role of the PDF can be put down to trying to overload its function: to try to imbue it with the qualities of both ‘model’ and ‘view’. … One of the things that software architects (and I suspect designers in general) have learned over the years is that if you try to give something functions that it shouldn’t have, you end up with a mess; if you can separate out the concerns, you get a much more elegant and robust solution.
My personal take on this is that we should keep these things very separate, and that if we do this, then many of the problems we’ve been discussing become more clearly defined (and I hope, many of the apparent contradictions, resolved).
So… a PDF (or come to that, an e-book version or a html page) is merely a *view* of an article. The article itself (the ‘model’) is a completely different (and perhaps more abstract) thing. Views can be tailored for a particular purpose, whether that’s for machine processing, human reading, human browsing, etc etc.
[paragraph break inserted]
The relationship between the views and their underlying model is managed by the concept of a ‘controller’. For example, if we represent an article’s model in XML or RDF (its text, illustrations, association nanopublications, annotations and whatever else we like), then that model can be transformed in to any number of views. In the case of converting XML into human-readable XHTML, there are many stable and mature technologies (XSLT etc). In the case of doing the same with PDF, the traditional controller is something that generates PDFs.
[paragraph break inserted]
The thing that’s been (somewhat) lacking so far is the two-way communication between view and model (via controller) that’s necessary to prevent the views from ossifying and becoming out of date (i.e. there’s no easy way to see that comments have been added to the HTML version of an article’s view if you happen to be reading the PDF version, so the view here can rapidly diverge from its underlying model).
[paragraph break inserted, link added]
Our Utopia software is an attempt to provide this two-way controller for PDFs. I believe that once you have this bidirectional relationship between view and model, then the actual detailed affordances of the individual views (i.e. what can a PDF do well / badly, what can HTML do well / badly) become less important. They are all merely means to channeling the content of an article to its destination (whether that’s human or machine).
The good thing about having this ‘model view controller’ take on the problem is that only the model needs to be pinned down completely …
Perhaps separating out our concerns in this way — that is, treating the PDF as one possible representation of an article — might help focus our criticisms of the current state of affairs? I fear at the moment we are conflating the issues to some degree.
- Steve Pettifer in a Beyond the PDF listserv post
I’m particularly interested in hearing if this perspective, using the MVC model, makes sense to others.