Archive for the ‘computer science’ Category

Code4Lib 2012 talk proposals are out

November 21st, 2011

Code4Lib2012 talk proposals are now on the wiki. This year there are 72 proposals for 20-25 slots. I pulled out the talks mentioning semantics (linked data, semantic web, microdata, RDF) for my own convenience (and maybe yours).

Property Graphs And TinkerPop Applications in Digital Libraries

  • Brian Tingle, California Digital Library

TinkerPop is an open source software development group focusing on technologies in the graph database space.
This talk will provide a general introduction to the TinkerPop Graph Stack and the property graph model is uses. The introduction will include code examples and explanations of the property graph models used by the Social Networks in Archival Context project and show how the historical social graph is exposed as a JSON/REST API implemented by a TinkerPop rexster Kibble that contains the application’s graph theory logic. Other graph database applications possible with TinkerPop such as RDF support, and citation analysis will also be discussed.

HTML5 Microdata and

  • Jason Ronallo, North Carolina State University Libraries

When the big search engines announced support for HTML5 microdata and the vocabularies, the balance of power for semantic markup in HTML shifted.

  • What is microdata?
  • Where does microdata fit with regards to other approaches like RDFa and microformats?
  • Where do libraries stand in the worldview of and what can they do about it?
  • How can implementing microdata and optimize your sites for search engines?
  • What tools are available?

“Linked-Data-Ready” Software for Libraries

  • Jennifer Bowen, University of Rochester River Campus Libraries

Linked data is poised to replace MARC as the basis for the new library bibliographic framework. For libraries to benefit from linked data, they must learn about it, experiment with it, demonstrate its usefulness, and take a leadership role in its deployment.

The eXtensible Catalog Organization (XCO) offers open-source software for libraries that is “linked-data-ready.” XC software prepares MARC and Dublin Core metadata for exposure to the semantic web, incorporating FRBR Group 1 entities and registered vocabularies for RDA elements and roles. This presentation will include a software demonstration, proposed software architecture for creation and management of linked data, a vision for how libraries can migrate from MARC to linked data, and an update on XCO progress toward linked data goals.

Your Catalog in Linked Data

  • Tom Johnson, Oregon State University Libraries

Linked Library Data activity over the last year has seen bibliographic data sets and vocabularies proliferating from traditional library
sources. We’ve reached a point where regular libraries don’t have to go it alone to be on the Semantic Web. There is a quickly growing pool of things we can actually ”link to”, and everyone’s existing data can be immediately enriched by participating.

This is a quick and dirty road to getting your catalog onto the Linked Data web. The talk will take you from start to finish, using Free Software tools to establish a namespace, put up a SPARQL endpoint, make a simple data model, convert MARC records to RDF, and link the results to major existing data sets (skipping conveniently over pesky processing time). A small amount of “why linked data?” content will be covered, but the primary goal is to leave you able to reproduce the process and start linking your catalog into the web of data. Appropriate documentation will be on the web.

NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis

  • Jeremy Nelson, Colorado College,

In October, the Library of Congress issued a news release, “A Bibliographic Framework for the Digital Age” outlining a list of requirements for a New Bibliographic Framework Environment. Responding to this challenge, this talk will demonstrate a Redis ( FRBR datastore proof-of-concept that, with a lightweight python-based interface, can meet these requirements.

Because FRBR is an Entity-Relationship model; it is easily implemented as key-value within the primitive data structures provided by Redis. Redis’ flexibility makes it easy to associate arbitrary metadata and vocabularies, like MARC, METS, VRA or MODS, with FRBR entities and inter-operate with legacy and emerging standards and practices like RDA Vocabularies and LinkedData.

ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us.

  • Declan Fleming, University of California, San Diego

What’s the right metadata standard to use for a digital repository? There isn’t just one standard that fits documents, videos, newspapers, audio files, local data, etc. And there is no standard to rule them all. So what do you do? At UC San Diego Libraries, we went down a conceptual level and attempted to hold every piece of metadata and give each holding place some context, hopefully in a common namespace. RDF has proven to be the ideal solution, and allows us to work with MODS, PREMIS, MIX, and just about anything else we’ve tried. It also opens up the potential for data re-use and authority control as other metadata owners start thinking about and expressing their data in the same way. I’ll talk about our workflow which takes metadata from a stew of various sources (CSV dumps, spreadsheet data of varying richness, MARC data, and MODS data), normalizes them into METS by our Metadata Specialists who create an assembly plan, and then ingests them into our digital asset management system. The result is a HTML, RSS, METS, XML, and opens linked data possibilities that we are just starting to explore.

UDFR: Building a Registry using Open-Source Semantic Software

  • Stephen Abrams, Associate Director, UC3, California Digital Library
  • Lisa Dawn Colvin, UDFR Project Manager, California Digital Library

Fundamental to effective long-term preservation analysis, planning, and intervention is the deep understanding of the diverse digital formats used to represent content. The Unified Digital Format Registry project (UDFR, will provide an open source platform for an online, semantically-enabled registry of significant format representation information.

We will give an introduction to the UDFR tool and its use within a preservation process.

We will also discuss our experiences of integrating disparate data sources and models into RDF: describing our iterative data modeling process and decisions around integrating vocabularies, data sources and provenance representation.

Finally, we will share how we extended an existing open-source semantic wiki tool, OntoWiki, to create the registry.

saveMLAK: How Librarians, Curators, Archivists and Library Engineers Work Together with Semantic MediaWiki after the Great Earthquake of Japan

  • Yuka Egusa, Senior Researcher of National Institute of Educational Policy Research
  • Makoto Okamoto, Chief Editor of Academic Resource Guide (ARG)

In March 11th 2011, the biggest earthquake and tsunami in the history attacked a large area of northern east region of Japan. A lot of people have worked together to save people in the area. For library community, a wiki named “savelibrary” was launched for sharing information on damages and rescues on the next day of the earthquake. Later then people from museum curators, archivists and community learning centers started similar projects. In April we joined to a project “saveMLAK”, and launched a wiki site using Semantic MediaWiki under

As of November 2011, information on over 13,000 cultural organizations are posted on the site by 269 contributors since the launch. The gathered information are organized along with Wiki categories of each type of facilities such library, museum, school, etc. We have held eight edit-a-thons to encourage people to contribute to the wiki.

We will report our activity, how the libraries and museums were damaged and have been recovered with lots of efforts, and how we can do a new style of collaboration with MLAK community, Wiki and other voluntary communities at the crisis.

Conversion by Wikibox, tweaked in Textwrangler. Trimmed email addresses, otherwise these are as-written. Did I miss one? Let me know!

Tags: , , , , , ,
Posted in computer science, library and information science, scholarly communication, semantic web | Comments (0)

Frank van Harmelen’s laws of information

November 1st, 2011

What are the laws of information? Frank van Harmelen proposes seven laws of information science in his keynote to the Semantic Web community at ISWC2011.1

  1. Factual knowledge is a graph.2
  2. Terminological knowledge is a hierarchy.
  3. Terminological knowledge is much smaller3 than the factual knowledge.
  4. Terminological knowledge is of low complexity.4
  5. Heterogeneity is unavoidable.5
  6. Publication should be distributed, computation should be centralized to decrease speed: “The Web is not a database, and I don’t think it ever will be.”
  7. Knowledge is layered.
What do you think? If they are laws, can they be proven/disproven?

Semantic Web vocabularies in the Tower of Babel

I wish every presentation came with this sort of summary: slides and transcript, presented in a linear fashion. But these laws deserve more attention and discussion–especially from information scientists. So I needed something even punchier to share, (prioritized thanks to Karen).

  1. He presents them as “computer science laws” underlying the Semantic Web; yet they are laws about knowledge. This makes them candidate laws of information science, in my terminology. []
  2. “The vast majority of our factual knowledge consists of simple relationships between things,
    represented as an ground instance of a binary predicate.
    And lots of these relations between things together form a giant graph.” []
  3. by 1-2 orders of magnitude []
  4. This is seen in “the unreasonable effectiveness of low-expressive KR”: “the information universe is apparently structured in such a way that the double exponential worse case complexity bounds don’t hit us in practice.” []
  5. But heterogeneity is solvable through mostly social, cultural, and economic means (algorithms contribute a little bit). []

Tags: , , ,
Posted in computer science, information ecosystem, library and information science, PhD diary, semantic web | Comments (0)

Digital backchannels

September 13th, 2009

A discussion of IRC in the classroom sent me off to my Zotero library for examples.

I remembered reading a few great papers on using IRC at conferences (these days twitter is the rage); what I didn’t remember was writing a mini-bibliography (shared below).

For teachers interested in using digital backchannels like IRC, IM, or twitter, the most pertinent is #6 below: Yardi, The role of the backchannel in collaborative learning environments. A new paper by graduate students at UBC is also worth a read: Nobarany, S., & Haraty, M. (2009 April 20). Supporting Classroom Discussions Using a Trust-enhanced Private Backchannel.” [author PDF] Proceedings of Human Interface Technologies 2008/9 Conference. Adding to the tools available, MIT media lab has a backchannel service, which they’ve written about [ACM copy].

UIUC’s LEEP program uses IRC as a backchannel in distance classes. (Though I hear they’re promoting verbal discussion with Elluminate this fall.) I found it very valuable to have private ‘whispers’ to classmates during our synchronous ‘live session’ classes. For me, it was also great to be able to type a question when I had it, without waiting for a pause in the audio lecture.

Originally written 2007-12-09 for GSLIS LIS 590IIL, Interfaces to Info Systems. Edited for links, formatting, and typos.

Digital backchannels refer to private communication between individuals also taking part in a public digital conversation. Whispers in live session are one example of a digital backchannel.

I reviewed 6 papers on “backchannels”, expanding outwards from 4 papers I found in CHI and CSCW proceedings. See the annotations in references below for more details about these papers. I recommend 2 papers. The seminal paper about digital backchannels, which I expect to become a classic in time, is Cogdill et. al. [1]. A briefer, but less meaty treatment, is given by McCarthy and boyd’s analysis of chatlogs from an in-person conference [2].

One caution is that, while linguists have studied “face-to-face oral backchannel for three decades” [1], studies in the digital realm are much newer, and “its spelling has not stabilized yet, so it can be found in all of its forms — backchannel, back-channel, and back channel — in current usage.”[1]. When researchers talk about digital backchannels, they sometimes seem to include private messaging between two individuals in the same sweep as group chats concurrent to some other activity. For example, Kellogg et. al. [3] discuss characteristics of backchannels which we’ll find familiar from LEEP classes—from the main room and private messages respectively: “They allow listeners to provide non-interruptive feedback to the speaker (‘raising hands,’ asking questions via IM), but at the same time they may take on the more private character of the second more political sense of backchannel (allowing two audience members to chat via IM with one another with no indication to others that it is occurring).”

Here’s a bit more detail about the papers I recommend:

Cogdil et. al. [1] present a taxonomy for backchannel communication. They “identified five backchannel categories: process-oriented, content-oriented, participation-enabling, tangential and independent backchannel.”

The CSCW ’04 conference had two events related to digital backchannels:

  • a panel presentation about digital backchannels [4]
  • an IRC chatroom for each of the conference’s three physical rooms

The chatrooms were logged for the duration of the conference, and McCarthy and boyd analyzed the chat logs[2]. (I wish I had a log of the chatroom from the panel session&emdash;talk about meta!)

McCarthy and boyd present two papers. One, coauthored with the panel presenters, is based on the panel presentation from CSCW [4]. It is their second paper—Digital backchannels in shared physical spaces: experiences at an academic conference [2]—that I find worthwhile. This paper organizes the backchannel IRC logs from CSCW ’04. With no apparent knowledge of Cogdil’s taxonomy, they provide concrete examples under their own rubric of “logistics, technology, people logistics, shared work, bonding”. These categories are overlapping with, but distinct from, Cogdil’s [1] conception. Additional discussion highlights social issues such as the privacy concerns of logging, the reactions of presenters, and the ingroup/outgroup concerns.

While these two papers [1] and [2] are the best of the lot, the others provide interesting context, because there are several sorts of research going on: pure sociological research (Cogdil [1]), social experiments (McCarthy and boyd [2]), educational research and experimentation ([6]), theoretical views on backchannels[4] , and commercial development projects ([3],[5]). (The last of these surprised me, but as Cogdil says, “Software designers can use these results to understand how the backchannel should function in digital conversation applications.” )

I think this variety is a microcosm of the sort of research presented in the ACM digital library. While browsing the suggested journals, I was especially struck by CHI, the JCDL, and CSCW. I downloaded papers on a variety of topics that seem within the province of librarians–even the traditional, non-digital sorts of librarians (perhaps I’ll write more about this sometime). I expected this in JCDL, since it is, of course, a joint conference, but I expected it less in the more mainstream ACM journals. Last semester, my IR class had an ongoing discussion about “what is information science” and where was the divide between computer science and library science, and library and information science. As UIUC embarks on the i-school movement with both ALA-accredited and non-ALA-accredited schools, I hope that this discussion of the relationship between information science and its sister fields will continue in larger forums, both within and outside of our classes.

From a usability perspective, I found it interesting that one paper explicitly referenced usability, while others talked of putting research into practice, tradeoffs, and trials.

And, before I sign-off, I’ll note 3 things that struck me particularly from Cogdill et. al. [1]:

“We also expect that participation-enabling backchannel takes place in asynchronous environments, but that it deals more with protocols such as how to subscribe and unsubscribe from the discussion.”-Cogdill et. al. [1] p7. I think they’re really underestimating this. I have off-list conversations about the listserv NGC4LIB and about the work of a journal committee quite regularly.

One disadvantage I’ve noted in the new style of private messaging in live session is the increased difficulty of self-archiving chats: “Users who want to preserve a backchannel conversation must do so for themselves, perhaps using their client software to capture a session log or pasting the contents of their backchannel exchanges into a text file. “Cogdill et. al. [1] p5. I’ve considered using my regular IRC software for live session, in part for its automatic logging capabilities. Of course, this still doesn’t address lining up conversations in-context.

Finally, Cogdill notes some disadvantages of the lack of awareness that one has about others’ whispers. In person, whispers may be observed even though their content is unknown. “If two students are silent on the mainchannel but active on the backchannel, the teacher may want to ask the students if they need assistance or need more time to accomplish some task.” Cogdill et. al. [1] p5

If you’ve made it this far, I’d be curious to know what you think.

smile -Jodi

[1] Cogdill, S., Fanderclai, T., Kilborn, J., & Williams, M. (2001). Backchannel: Whispering in Digital Conversation [Citeseer PDF]. Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34)-Volume 4 – Volume 4, 4033. doi: 10.1109/HICSS.2001.926500

Highly recommended (8 pages). Provides a “taxonomy of backchannel discourse”: process- oriented, content-oriented, participation-enabling, tangential, and independent backchannel.”, providing examples of each.

Describes various meanings of backchannel, notes that linguists have studied “face-to-face oral backchannel for three decades”, provides properties of “virtual backchannel” (private, multithreaded, and invisible). Their taxonomy was developed through analysis of “chat transcripts from several MUDs (text-based, persistent, user-extensible virtual environments). Thirty-six transcripts representing a total of 62 person hours of chat were studied”. Discusses possibilities for awareness and persistence of backchannels, and explains how this introduces self-censorship and group censorship. Typesetter’s errors in distinguishing italics from non-italics mar the presentation of the private/public distinctions in the chats analyzed.

[2] McCarthy, J. F., & boyd, D. M. (2005). Digital backchannels in shared physical spaces: experiences at an academic conference[author’s PDF]. CHI ’05 Extended Abstracts on Human Factors in Computing Systems, 1641-1644. doi: 10.1145/1056808.1056986 [ACM copy]

Highly recommended (4 pages). Provides a detailed analysis of the IRC channels used at CSCW 2004, including concrete examples of different types of exchanges. A bit different since it’s about supplementing in-person communication with digital backchannels.

[3] Kellogg, W. A., Erickson, T., Wolf, T. V., Levy, S., Christensen, J., Sussman, J., et al. (2006). Leveraging digital backchannels to enhance user experience in electronically mediated communication [Author PDF]. Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, 451-454. doi: 10.1145/1180875.1180943 [ACM copy]

Discusses backchannels in the context of IBM VoIP conference call software which includes IM and visual backchannels.

[4] McCarthy, J. F., boyd, D., Churchill, E. F., Griswold, W. G., Lawley, E., & Zaner, M. (2004). Digital backchannels in shared physical spaces: attention, intention and contention [Author PDF]. Proceedings of the 2004 ACM conference on Computer Supported Cooperative Work, 550-553. doi: 10.1145/1031607.1031700 [ACM copy]

Notes from the panel held at CSCW 2004 on digital backchannels. Primarily records biographies and prepared statements of panel members. Data collected from the whole of this conference led to the analysis [2].

[5] Yankelovich, N., McGinn, J., Wessler, M., Kaplan, J., Provino, J., & Fox, H. (2005). Private communications in public meetings. CHI ’05 Extended Abstracts on Human Factors in Computing Systems, 1873-1876. doi: 10.1145/1056808.1057044 [ACM copy]

Discusses Sun Microsystems’ Meeting Central software, for distributed audio conferencing, which has private text and voice chats. Discusses usability testing, including screenshots of before and after designs.

[6] Yardi, S. (2006). The role of the backchannel in collaborative learning environments [Author PDF]. Proceedings of the 7th International Conference on Learning Sciences, 852-858. doi: 10.1145/1056808.1057044 [ACM copy]

“Students at UC Berkeley’s School of Information have participated in a persistent, online “backchannel” chatroom during class since the Fall of 2004.” Provides statistics about the chatroom usage, “indicating that a few users participate most often.” Posits the advantages as constructivist learning and peer-to-peer learning. Discusses the need for chatroom etiquette and the potential for distraction, as well as helpful inquiries.

Tags: , , ,
Posted in computer science, information ecosystem | Comments (0)

Onward and upward

September 4th, 2009

Today is my last day at Appalachian State University.

Monday I begin a new adventure as community organizer, helping launch Acawiki, a “wiki for academic research”. The brainchild of Neeru Paharia, Acawiki strives to make research papers easier to access and understand. Go write your own summary!

The next month will find me living in Massachusetts, my adult home, while preparing for a move to Ireland!

In October, I’ll be joining the Social Software Unit at DERI for a fellowship. The group does fascinating work on social software and the semantic web. This is a 3(or 4)-year Ph.D. project, where I’ll be working on modeling online discussions/arguments. More about that soon!

I’m looking for practical advice of all sorts—about community organizing, about moving to Ireland and living abroad, about success in Ph.D. studies. Consider this your personal solicitation for tips, tricks, and advice!

Tags: , , , ,
Posted in computer science, higher education, library and information science, random thoughts | Comments (6)

JCDL 2009 Poster Session in Second Life

June 18th, 2009

Last night I popped into Second Life for a poster session. JCDL 2009 is going on in Austin this week, and several of the posters were on display in the Digital Preserve region of SL. Chris Beer asked for some screenshots.

Here’s the whole poster space from outside. (Click each image for the ginormous full-size screenshot.)
Poster Session Entrance
My avatar (TR Telling) is in a bright orange UIUC GSLIS T-shirt, thanks to a class tour Richard Urban led last year. With a closer look, you can spot the screen that was used to project MinuteMadness.

Here are two posters, “Finding Centuries-Old Hyperlinks” and “Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition”.
Two Posters: "Finding Centuries-Old Hyperlinks" and "Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition"Poster numbers were used for the best poster competition, I believe.

Large text-sizes really help viewing from afar; deft users can get a closer view with ‘mouse look’. I took a second screenshot of the “Finding Centuries-Old Hyperlinks” poster since it was my favorite. Xiaoyue (Elaine) Wang and Eamonn Keogh suggest cross-referencing manuscript pages using icon similarity.
Closer View of "Finding Centuries-Old Hyperlinks"Handouts could be really useful for a SL poster session — I had to settle for taking screenshots. Clicking on the poster could give a copy of the poster, which could include links to more information. A mailbox could facilitate sending messages to the presenters.

One presenter ‘attended’ from New York. Several people are gathered around her poster, which generated a lot of discussion.
In the left corner you can see one of the more visually striking posters, a study of LIS students’ impressions of the Kindle, after using it for something like 3 weeks.

To the right of the entrance is a sign that says “What did you think?”, which linked to a comment form to be completed on the Web. I succeeded at that box, but wasn’t able to figure out how to submit a second, in-world comment form.

My avatar is just stepping down from a rotating lazy-susan which held a striking comment box. Getting a comment form and filling it out was straightforward. However, dragging and dropping the form back onto the box, as suggested, didn’t work for me.

I had several interesting conversations, most notably a chat outside in the Poster Garden with Javier Velasco Martin who helped build and furnish the Preserve. Ed Fox was easily identifiable: his avatar’s first name is EdFox. For social gatherings, handles are useful, but for professional gatherings it can be reassuring to know who you’re talking with.

Here’s one last look at the dome from the outside. I love the bright aqua JCDL lettering. And, what trip to Second Life would be complete without some flying?
Flying by the JCDL Poster Session Dome With a closer look, you can see the large comment box in the center of the dome.

Tags: , , , ,
Posted in computer science, future of publishing, higher education, library and information science | Comments (1)

Computational Thinking: quoting Jeannette Wing

December 13th, 2008

Karin Dalziel’s Why every Library Science student should learn programming reminds me that I’ve been thinking about, and meaning to write about, algorithmic (or computational) thinking.

What is computational thinking? It includes

  • Thinking Recursively
  • Thinking Abstractly
  • Thinking Ahead (caching, pre-fetching…)
  • Thinking Procedurally
  • Thinking Logically
  • Thinking Concurrently

That’s from Jeannette Wing slide 21 [PDF]; subsequent slides give examples. Or, if you prefer podcasts, she chatted about computation thinking with Jon Udell.

I would like to find examples of where librarians and archivists use computational thinking, especially outside the digital realm. It’s hard to argue that programming per se is needed for school media specialists or archivists. Some digital librarians and LIS educators also argue that, for digital librarians, managing programmers and interfacing with users are more pertinent skills than programming per se.

So I’d like to shift the debate. Instead of “should all LIS students learn to program”, I’d like to ask, what can LIS learn from computer science? Programming is only a very small part of computer science; as Jeannette M. Wing writes* [PDF]

Computer science is not computer programming. Thinking like a computer scientist means more than being able to program a computer. It requires thinking at multiple levels of abstraction


Having to solve a particular problem, we might ask: How difficult is it to solve? and What’s the best way to solve it? Computer science rests on solid theoretical underpinnings to answer such questions precisely.

Can LIS benefit from considering problems in this way? As a librarian or information professional, have you ever considered a problem from this angle? How did it turn out?

* Jeannette M. Wing Computational Thinking [postprint, PDF] (2006 March). Communications of the ACM, Vol 49, No 3, 33-35.

Tags: , ,
Posted in computer science, library and information science, programming | Comments (1)