Archive for the ‘semantic web’ Category

Reading Ontologically?

July 24th, 2011

What are the right ontologies for reading? And what kind of ontology support would let books recombine themselves, on the fly, in novel ways?

Today keyword searches within books and book collections is commonplace, highlighting a word in your ebook reader can bring up a definition, and dictionaries grab recent examples of word use from microblogs. ((In 2003, Gregory Crane wrote that “Already the books in a digital library are beginning to read one another and to confer among themselves before creating a new synthetic document for review by their human readers.” When I first read it in 2006, that article seemed incredibly visionary to me. Yet these commonplace “syntheses” no longer seem extraordinary to me.)) But can’t we do more? But what kind of synthesis do we need (and what is possible) for supporting readers of literature, classics, and humanities texts?

Current approaches seem to aim at analysis (e.g. getting an overview of the literary works of a period with “distant reading”/”macroanalysis”) and at creating flexible critical editions (e.g. structural, sometimes overlapping markup, as in TEI-based editions and projects like Wendell Piez’ Sonneteer ((currently offline, but brilliant; do check back, meanwhile see also his Digital Humanities 2010 talk notes)).) I would call these “sensemaking” approaches rather than tools for reading.

I was intrigued by the Bible Ontology ((It’s a bit disingenuous to advertise their work as an ontology: in fact they have applied the ontology, rather than just creating it.)) because of their tagline: “ever wanted to read and study the Bible Ontologically?” Yet I don’t really know what they mean by reading ontologically ((even though I’ve given a talk about supporting reading with ontologies!)).

Of course, they have recorded various pieces of data. For instance, for Rebekah, we see her children, siblings, birthplace, book and chapters she figures in, etc.: http://bibleontology.com/page/Rebekah. ((The most meaningful of their terms is the bop:isRelatedInEvent, perhaps since these events, like Isaac_blesses_Jacob, would require more analysis to discern.))

Rebekah, from bibleontology.com

They offer a SPARQL endpoint, so you can query. For instance, to find all the married women ((Gender is not recorded so we can’t (yet) ask for all the women overall, though I’ve just asked about this.)) (live query result):

PREFIX bop: <http://bibleontology.com/property/>
select ?s ?o where {?s bop:isWifeOf ?o }

Intense and long-term work has gone into Bible concordances, scholarship, etc., so it seems like a great use case for “reading ontologically”. With theologians and others looking at the site, using the SPARQL endpoint, etc., perhaps someone will be able to tell me what that means!

Tags: , , , , ,
Posted in books and reading, future of publishing, semantic web | Comments (0)

Enabling a Data Web: Is RDF the only choice?

July 8th, 2011

I’ve been slow in blogging about the Web Science Summer School being held in Galway this week. Check Clare Hooper’s blog for more reactions (starting from her day one post from two days ago).

Wednesday was full of deep and useful talks, but I have to start at the beginning, so I had to wait for a copy of Stefan Decker’s slides.

Hidden in the orientation to DERI, there are a few slides (12-19) which will be new to DERIans. They’re based on an argument Stefan made to the database community recently: any data format enabling the data Web is “more or less” isomorphic to RDF.

The argument goes:
The three enablers for the (document) Web were:

  1. scalability
  2. no censorship
  3. a positive feedback loop (exploiting Metcalf’s Law) ((The value of a communication network is proportional to the number of connections between nodes, or n^2 for n nodes)).

Take these as requirements for the data Web. Enabling Metcalf’s Law, according to Stefan, requires:

  1. Global Object Identity.
  2. Composability: The value of data can be increased if it can be combined with other data.

The bulk of his argument focuses on this composability feature. What sort of data format allows composability?

It should:

  1. Have no schema.
  2. Be self-describing.
  3. Be “object centric”.  In order to integrate information about different entities data must be related to these entities.
  4. Be graph-based, because object-centric data sources, when composed, results in a graph, in the general case.

Stefan’s claim is that any data format that fulfills the requirements is “more or less” isomorphic to RDF.

Several parts of this argument confuse me. First, it’s not clear to me that a positive feedback loop is the same as exploiting Metcalf’s Law. Second, can’t information can be composed even when it is not object-centric? (Is it obvious that entities are required, in general?) Third, I vaguely understand that composing object-centric data sources results in a (possibly disjoint) graph: but are graphs the only/best way to think about this? Further, how can I convince myself about this (presumably obvious) fact about data integration.

Tags: , , , , , , , ,
Posted in information ecosystem, semantic web | Comments (1)

Extended deadline for STLR 2011

April 29th, 2011

We’ve extended the STLR 2011 deadline due to several requests; submissions are now due May 8th.

JCDL workshops are split over two half-days, and we are lucky enough to have *two* keynote speakers: Bernhard Haslhofer of the University of Vienna and Cathy Marshall of Microsoft Research.

Consider submitting!

CALL FOR PARTICIPATION
The 1st Workshop on Semantic Web Technologies for Libraries and Readers

STLR 2011

June 16 (PM) & 17 (AM) 2011

http://stlr2011.weebly.com/
Co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011 Ottawa, Canada

While Semantic Web technologies are successfully being applied to library catalogs and digital libraries, the semantic enhancement of books and other electronic media is ripe for further exploration. Connections between envisioned and emerging scholarly objects (which are doubtless social and semantic) and the digital libraries in which these items will be housed, encountered, and explored have yet to be made and implemented. Likewise, mobile reading brings new opportunities for personalized, context-aware interactions between reader and material, enriched by information such as location, time of day and access history.

This full-day workshop, motivated by the idea that reading is mobile, interactive, social, and material, will be focused on semantically enhancing electronic media as well as on the mobile and social aspects of the Semantic Web for electronic media, libraries and their users. It aims to bring together practitioners and developers involved in semantically enhancing electronic media (including documents, books, research objects, multimedia materials and digital libraries) as well as academics researching more formal aspects of the interactions between such resources and their users. We also particularly invite entrepreneurs and developers interested in enhancing electronic media using Semantic Web technologies with a user-centered approach.

We invite the submission of papers, demonstrations and posters which describe implementations or original research that are related (but are not limited) to the following areas of interest:

  • Strategies for semantic publishing (technical, social, and economic)
  • Approaches for consuming semantic representations of digital documents and electronic media
  • Open and shared semantic bookmarks and annotations for mobile and device-independent use
  • User-centered approaches for semantically annotating reading lists and/or library catalogues
  • Applications of Semantic Web technologies for building personal or context-aware media libraries
  • Approaches for interacting with context-aware electronic media (e.g. location-aware storytelling, context-sensitive mobile applications, use of geolocation, personalization, etc.)
  • Applications for media recommendations and filtering using Semantic Web technologies
  • Applications integrating natural language processing with approaches for semantic annotation of reading materials
  • Applications leveraging the interoperability of semantic annotations for aggregation and crowd-sourcing
  • Approaches for discipline-specific or task-specific information sharing and collaboration
  • Social semantic approaches for using, publishing, and filtering scholarly objects and personal electronic media

IMPORTANT DATES

*EXTENDED* Paper submission deadline: May 8th 2011
Acceptance notification: June 1st 2011
Camera-ready version: June 8th 2011

KEYNOTE SPEAKERS

PROGRAM COMMITTEE

Each submission will be independently reviewed by 2-3 program committee members.

ORGANIZING COMMITTEE

  • Alison Callahan, Dept of Biology, Carleton University, Ottawa, Canada
  • Dr. Michel Dumontier, Dept of Biology, Carleton University, Ottawa, Canada
  • Jodi Schneider, DERI, NUI Galway, Ireland
  • Dr. Lars Svensson, German National Library

SUBMISSION INSTRUCTIONS

Please use PDF format for all submissions. Semantically annotated versions of submissions, and submissions in novel digital formats, are encouraged and will be accepted in addition to a PDF version.
All submissions must adhere to the following page limits:
Full length papers: maximum 8 pages
Demonstrations: 2 pages
Posters: 1 page
Use the ACM template for formatting: http://www.acm.org/sigs/pubs/proceed/template.html
Submit using EasyChair: https://www.easychair.org/conferences/?conf=stlr2011

Tags: , , , , , , , , , , , , ,
Posted in future of publishing, library and information science, PhD diary, scholarly communication, semantic web, social semantic web | Comments (2)

Making provenance pay

December 19th, 2010

Provenance, Dan Conover says, can drive the adoption of semantic technologies:

Imagine a global economy in which every piece of information is linked directly to its meaning and origin. In which queries produce answers, not expensive, time-consuming evaluation tasks. Imagine a world in which reliable, intelligent information structures give everyone an equal ability to make profitable decisions, or in many cases, profitable new information products. Imagine companies that get paid for the information they generate or collect based on its value to end users, rather than on the transitory attention it generates as it passes across a screen before disappearing into oblivion.

Now imagine copyright and intellectual property laws that give us practical ways of tracing the value of original contributions and collecting and distributing marginal payments across vast scales.

That’s the Semantic Economy.

– Dan Conover on the semantic economy (my emphasis added).
via Bora Zivkovic on Twitter

I wonder if he’s seen the W3 Provenance XG Final Report yet. Two parts are particularly relevant: the dimensions of provenance and the news aggregator scenario. Truly making provenance pay will require both Management of provenance (especially Access and Scale) and Content provenance around Attribution.

Go read the rest of what Dan Conover says about the semantic economy. Pay particular attention to the end: Dan says that he’s working on a functional spec for a Semantic Content Management System — a RDF-based middleware so easy that writers and editors will want to use it. I know you’re thinking of Drupal and of the Semantic Desktop; we’ll see how he’s differentiating: He invites further conversation.

I’m definitely going to have a closer look at his ideas: I like the way he thinks, and this isn’t the first time I’ve noticed his ideas for making Linked Data profitable.

Tags: , , , , , ,
Posted in future of publishing, information ecosystem, PhD diary, scholarly communication, semantic web | Comments (0)

The Social Semantic Web – a message for scholarly publishers

November 15th, 2010

I always appreciate how Geoffrey Bilder can manage to talk about the Social Semantic Web and the early modern print in (nearly) the same breath. See for yourself in the presentation he gave to scholarly publishers at the International Society of Managing and Technical Editors last month.

Geoff’s presentation is outlined, to a large extent, in an interview Geoff gave 18 months ago (search “key messages” to find the good bits). I hope to blog further about these, because Geoff has so many good things to say, which deserve unpacking!

I especially love the timeline from slide 159, which shows that we’re just past the incunabula age of the Internet:

The Early Modern Internet

We're still in the Early Modern era of the Internet. Compare to the history of print.

Tags: , , , , , ,
Posted in future of publishing, information ecosystem, PhD diary, scholarly communication, semantic web, social semantic web, social web | Comments (3)

Utopia Documents: pulling scientific data into the PDF for interactive exploration

November 14th, 2010

What if data were accessible within the document itself?

Utopia Documents is a free PDF viewer which recognizes certain enhanced figures, and fetches the underlying data. This allows readers to view and experiment with the tables, graphs, molecular structures, and sequences in situ.


You can download Utopia Documents for Mac and Windows to view enhanced papers, such as those published in The Semantic Biochemical Journal.

These screencasts were made from pages 9 and 10 of PDF of a paper by the Manchester-based Utopia team: T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, Dec 2009. doi:10.1042/BJ20091474.

In an interview at the Guardian, Utopia’s Phillip McDermott says:

“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.

“Semantics, loose-coupling, fingerprinting and linked-data are the key ingredients. All the data is described using ontologies, and a plug-in system allows third parties to integrate their database or tool within a few lines of script. We use fingerprinting to allow us to recognise what paper a user is reading, and to spot duplicates. All annotations are held remotely, so that wherever you view a paper, the result is the same.”

I’d still like to see a demo of the commenting functionality.

I’d also be particularly interested in the publisher perspective, about the production work that goes into creating the enhancements. Portland Press’s October news announces that they’ve been promoting Utopia at the Charleston conference and SSP, with an upcoming appearance at the STM Innovations Seminar.

Utopia came to my attention via Steve Pettifer’s mention.

Tags: , , , , , , , , ,
Posted in future of publishing, information ecosystem, library and information science, scholarly communication, semantic web, social semantic web | Comments (4)

CiTO in the wild

October 18th, 2010

CiTO has escaped the lab and can now be used either directly in the CiteULike interface or with CiteULike machine tags. Go Citation Typing Ontology!

In the CiteULike Interface

To add a CiTO relationship between articles using the CiteULike interface, both articles must be in your own library. You’ll see a a “Citations (CiTO)” section after your tags. Click on edit and set the current article as the target.

set the CiTO target

First set the CiTO target

Then navigate around your own library to find a related article. Now you can add a CiTO tag.

Adding a CiTO tag in CiteULike

Adding a CiTO tag in CiteULike

There are a lot of choices. Choose just one. :)

CiTO Object Properties appear in the dropdown

CiTO Object Properties now appear in the dropdown

Congratulations, you’ve added a CiTO relationship! Now mousing over the CiTO section will show details on the related article.

CiTO result

Mouse over the resulting CiTO tag to get details of the related article

Machine Tags

Machine tags take fewer clicks but a little more know-how. They can be added just like any other tag, as long as you know the secret formula: cito--(insert a CiTO Object Property here from this list)--(insert article permalink numbers here) Here are two more concrete examples.

First, we can keep a list of articles citing a paper. For example, tagging an article

cito--cites--1375511

says “this article CiTO:cites article 137511”. Article 137511 can be found at http://www.citeulike.org/article/137511, aka JChemPaint – Using the Collaborative Forces of the Internet to Develop a Free Editor for 2D Chemical Structures. Then we can get the list of (hand-tagged) citations to the article. Look—a community generated reverse citation index!

Second, we can indicate specific relationships between articles, whether or not they cite each other. For example, tagging an article

cito--usesmethodin--423382

says “this item CiTO:usesmethodin item 42338”. Item 42338 is found at http://www.citeulike.org/article/423382, aka The Chemistry Development Kit (CDK):  An Open-Source Java Library for Chemo- and Bioinformatics.

Upshot

Automation and improved annotation interfaces will make CiTO more useful. CiTO:cites and CiTO:isCitedBy could used to mark up existing relationships in digital libraries such as ACM Digital Library and CiteSeer, and could enhance collections like Google Books and Mendeley, to make human navigation and automated use easier. To capture more sophisticated relationships, David Shotton has hopes of authors marking up citations before submitting papers; if it’s required, anything is possible. Data curators and article commentators may observe contradictions between papers, or methodology reuses; in these cases CiTO could be layered with an annotation ontology such as AO in order to make the provenance of such assertions clear.

CiTO could put pressure on existing publishers and information providers to improve their data services, perform more data cleanup, or to exposing bibliographies in open formats. Improved tools will be needed, as well as communities that are willing to add data by hand, and algorithms for inferring deep citation relationships.

One remaining challenge is aggregation of CiTO relationships between bibliographic data providers; article identifiers such as DOI are unfortunately not universal, and the bibliographic environment is messy, with many types of items, from books to theses to white papers to articles to reports. CiTO and related ontologies will help explicitly show the bibliographic web and relationships between these items, on the web of (meta)data.

Further Details

CiTO is part of an ecosystem of citations called Semantic Publishing and Referencing Ontologies (SPAR); see also the JISC Open Citation Project which is taking bibliographic data to the Web, and the JISC Open Bibliography Project. For those familiar with Shotton’s earlier writing on CiTO, note that SPAR breaks out some parts of the earlier formulation of this ontology.

Tags: , , , , ,
Posted in argumentative discussions, books and reading, information ecosystem, library and information science, PhD diary, scholarly communication, semantic web | Comments (3)

Enabling a Social Semantic Web for Argumentation (defining my Ph.D. research problem)

July 23rd, 2010

I’m working on online argumentation: Making it easier to have discussions, get to consensus, and understand disagreements across websites.

Here are the 3 key questions and the most closely related work that I’ve identified in the first 9 months of my Ph.D.

Read on, if you want to know more. Then let me know what you think! Suggestions will be especially helpful since I’m writing my first year Ph.D. report, which will set the direction for my second year at DERI.


Enabling a Social Semantic Web for Argumentation

Argumentative discussions occur informally throughout the Web, however there is currently no way of bringing together all of the discussions on a given topic along with an indication of who is agreeing and who is disagreeing. Thus substantial human analysis is required to integrate opinions and expertise to, for instance, determine the best policies and procedures to mitigate global warming, or the recommended treatment for a given disease. New techniques for gathering and organising the Social Web using ontologies such as FOAF and SIOC show promise for creating a Social Semantic Web for argumentation.

I am currently investigating three main research questions to establish the Social Semantic Web for argumentation:

  1. How can we best define argumentation for the Social Semantic Web, to isolate the essential problems? We wish to enable reasoning with inconsistent knowledge, to integrate disparate knowledge, and identify consensus and disputes.  Similar questions and techniques come up in related but distinct areas, such as sentiment analysis, dialogue mapping, dispute resolution, question-answering and e-government participation.
  2. What sort of modular framework for argumentation can support distributed, emergent argumentation — a World Wide Argumentation Web? Some Web 2.0 tools, such as Debatepedia, LivingVote, and Debategraph, provide integrated environments for explicit argumentation. But our goal is for individuals to be able to use their own preferred tools — in a social environment — while understanding what else is being discussed.
  3. How can we manage the tension between informality and ease of expression on the one hand and formal semantics and retrievability/reusability on the other hand? Minimal integration of informal arguments requires two pieces of information: a statement of the issue or proposition, and an indication of polarity (agreement or disagreement). How can we gather this information without adding cognitive overhead for users?

Related Work

Ennals et al. ask: ‘What is disputed on the Web? (Ennals 2010b). They use annotation and NLP techniques to develop a prototype system for highlighting disputed claims in Web documents (Ennals 2010a). Cabanac et al. find that two algorithms for identifying the level of controversy about an issue were up to 84% accurate (compared to human perception), on a corpus of 13 arguments. These are useful prototypes of what could be done; Ennals prototype is indeed a Web-scale system, but disputed claims are not arguments.

Rahwan et al. (2007) present a pilot Semantic Web-based system, ArgDF, in which users can create arguments, and query to find networks of arguments. ArgDF is backed with the AIF-RDF ontology, and uses Semantic Web standards.  Rahwan (2008) surveys current Web2.0 tools, pointing out that integration between these tools is lacking, and that only very shallow argument structures are supported; ArgDF and AIF-RDF are explained as an improvement. What is lacking is uptake in end-user orientated (e.g. Web 2.0) tools.

The Web2.0 aspect of the problem is explored in several papers, including Buckingham Shum (2008), which presents Cohere, a Web2.0-style argumentation system supporting existing (non-Semantic Web) argumentation standards, and Groza et al. (2009) which proposes a abstract framework for modeling argumentation. These are either minimally implemented frameworks or stand-alone systems which do not yet support the distributed, emergent argumentation envisioned, as further elucidated by Buckingham Shum (2010).

References with links to preprints

  1. S. Buckingham Shum, “Cohere: Towards Web 2.0 Argumentation,” Computational Models of Argument – Proceedings of COMMA 2008, IOS Press, 2008.
  2. S. Buckingham Shum, AIF Use Case: Iraq Debate, Glenshee, Scotland, UK: 2010. http://projects.kmi.open.ac.uk/hyperdiscourse/docs/AIF-UseCase-v2.pdf
  3. G. Cabanac, M. Chevalier, C. Chrisment, and C. Julien, “Social validation of collective annotations: Definition and experiment,” Journal of the American Society for Information Science and Technology, vol. 61, 2010, pp. 271-287.
  4. R. Ennals, B. Trushkowsky, and J.M. Agosta, “Highlighting Disputed Claims on the Web,” WICOW 2010, Raleigh, North Carolina: 2010.
  5. R. Ennals, D. Byler, J.M. Agosta, and Barboara Rosario, “What is Disputed on the Web?,” WWW 2010, Raleigh, North Carolina: 2010.
  6. T. Groza, S. Handschuh, J.G. Breslin, and S. Decker, “An Abstract Framework for Modeling Argumentation in Virtual Communities,” International Journal of Virtual Communities and Social Networking, vol. 1, Sep. 2009, pp. 35-47. 
  7. I. Rahwan, “Mass argumentation and the semantic web,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 6, Feb. 2008, pp. 29-37.
  8. I. Rahwan, F. Zablith, and C. Reed, “Laying the foundations for a World Wide Argument Web,” Artificial Intelligence, vol. 171, Jul. 2007, pp. 897-921.

Tags: ,
Posted in argumentative discussions, PhD diary, semantic web, social semantic web, social web | Comments (1)

DERI “Research Explained” video series

July 15th, 2010

Word has gotten out about DERI’s “Research Explained” video series, which I’m narrating. These videos explain DERI’s Semantic Web research to a broad audience, so far in three areas: mobile/social sensing, expert finding, and semantic search.

James Lyng, Julie Letierce, Brendan Smith, and Dr. Brian Wall produce these videos with in collaboration with DERI scientists. Drawings are by Eoghan Hynes and James Lyng.

screenshot from "Semantic Search Explained" at YouTube

Watch the series at DERI Galway’s youtube video channel.

My voiceover role came thanks to Julie’s instigation, since I had narrated a screencast for our colleague Peyman Nasirifard’s Conterprise project.

Tags: , , ,
Posted in scholarly communication, semantic web | Comments (0)