Archive for the ‘library and information science’ Category

Citation management means different things to different people

August 3rd, 2011

I got to talking with a mathematician friend about citation management. We came to the conclusion that “manage PDFs” is my primary goal while “get out good citations” is his primary goal. I thought it would interesting to look at his requirements.

His ideal program would

  1. Organize the PDFs (Papers does this, when it doesn’t botch the author names and the title) preferably in the file system, so I can use Dropbox
  2. Get BibTeX entires from MathSciNet, ACM, etc. EXACTLY AS THEY ARE
  3. Have some decent way to organize notes by “project” or something

He doesn’t care about:

  1. Typing \cite
  2. A “unified” bibliographic database
  3. Social bibliographies (though I am not against them; it is just not a burning issue)

He says:

I guess the point is that, if I am writing something and I know I want to cite it, and I know there is a “official” BibTeX for it, I just need a way to get that more quickly than:

  1. Type the URL
  2. Click on “Proxy this” in my bookmarks bar
  3. Search for the paper
  4. Copy/paste the BibTeX
  5. Edit the cite key to something mnemonic

He followed up with an example of the “awful” awful, lossy markup Papers produces which loses information including the ISSN and DOI; he prefers the minimalist BibTeX. (oops!; he adds “I understated how bad papers is. The real papers entry (top) not only has screwy names, but junk instead of the full journal name. The papers cite key is meaningless noise too (but mathscinet is meaningful noise).”) To get around this, he does the same search/download “a million times”.

AMS Papers2 BibTeX:
@article{AR78,
author = {L Asimow and B Roth},
journal = {Trans. Amer. Math. Soc.},
title = {The rigidity of graphs},
pages = {279--289},
volume = {245},
year = {1978},
}

Papers' The AMS version of the same BibTeX:
@article {AR78,
    AUTHOR = {Asimow, L. and Roth, B.},
     TITLE = {The rigidity of graphs},
   JOURNAL = {Trans. Amer. Math. Soc.},
  FJOURNAL = {Transactions of the American Mathematical Society},
    VOLUME = {245},
      YEAR = {1978},
     PAGES = {279--289},
      ISSN = {0002-9947},
     CODEN = {TAMTAM},
   MRCLASS = {57M15 (05C10 52A40 53B50 73K05)},
  MRNUMBER = {511410 (80i:57004a)},
MRREVIEWER = {G. Laman},
       DOI = {10.2307/1998867},
       URL = {http://dx.doi.org/10.2307/1998867},
}

I’ve just discovered that BibDesk‘s1 ‘minimize’ does what he wants: its has output is quite close to the AMS Papers2 version:

@article{AR78,
	Author = {Asimow, L. and Roth, B.},
	Journal = {Trans. Amer. Math. Soc.},
	Pages = {279--289},
	Title = {The rigidity of graphs},
	Volume = {245},
	Year = {1978}}

I’d still like to understand the impact the non-minimal BibTeX is having; could be bad citation styles are causing part of the problem.

While we have different needs for citation management, we’re both annoyed by the default filenames many publishers use – like fulltext.pdf and sdarticle.pdf. But I’ll tolerate these, as long as I can get to it from a database index with a nice frontend.

We of course moved on to discussing how research needs an iTunes or, as Geoff Bilder has called it, an iPapers.

This blog post brought to you by Google chat and the number 3.

  1. See also A short review of BibDesk from MacResearch []

Tags: , , , , ,
Posted in books and reading, information ecosystem, library and information science, scholarly communication | Comments (0)

Sente, a first look

August 1st, 2011

Today I’ve been testing out Sente, on the theory that it might help me organize the PDFs I’m annotating on my iPad.

The desktop application is geared to Mac users who really care about bibliographies, with several fantastic features, including

I like Sente’s statuses; read/unread and Recently Modified and Recently Added are automatically tracked, and you can rate items. I especially like the workflow statuses, which match some of my common tasks:

  • Get Full Text
  • Discuss Further
  • Cite
  • Do Not Cite

“Sort by citation” is surprisingly illuminating: I didn’t realize how many papers from “Discourse Studies” I’d been looking at recently.

Another great feature that could be easily and fruitfully added to most other bibliographic managers: title case and exact case lists (I am *so* sick of seeing lowercased ‘wikipedia’ in bibliographies!), which you can very easily customize.
Sente also has a journal dictionary: You can assign the abbreviations and ISSNs (authority control, yippee!)!

Their visual display could use an update (thankfully it’s on the way) and I find their icons confusing (maybe ‘pencil’ for ‘note’ is sensible, but what in the world about ‘four dots in a diamond shape’ says ‘abstract’ to you?)

I tested the Zotero import. As I wrote Sente’s developers, there are some issues:

In testing it out on my large (5000+ item) Zotero library I see that:

  1. HTML attachments are not copied into the Sente library
  2. Image attachments are not copied into the Sente library
  3. Text note attachments are not copied into the Sente library
  4. Subcollections are not preserved

Since then, I’ve noticed that the keywords don’t get imported. Further, the date added and “date modified” fields are not preserved, but instead now reflect the import date and time (as I noted on twitter). But I do like their duplicate detection. Along with promising to consolidate matched items, they provide a report about the discarded matches. For instance:

Rule “DOI rule” flagged these two references as possible duplicates:
Vilar, P., & Žumer, M. (2008). Perceptions and importance of user friendliness of IR systems according to users’ individual characteristics and academic discipline. Journal of the American Society for Information Science & Technology, 59(12), 1995-2007. doi:Article
Quick-Response Barcodes. (2008). Library Technology Reports, 44(5), 46-47. doi:Article
However, the match was rejected because the references differ in: Article Title, pages, Publication Title, URL, Volume, Issue.

I have played briefly with the Sente’s free iPad viewer, but not yet with their paid ($19.99) app which allows annotation. Based on reviews (why no permalinks, Apple?), “Export seems to be an option but crucially, import is not.” However, if Sente’s annotation is enough, there’s hope, since documentation of the Sync functionality already in the current (6.2) version the description of Sync for the planned 6.5 release (via this) is *very* promising: “As you read a PDF on your iPad on the bus ride home, highlighting passages and taking notes, the highlighting and notes appear in all copies by the time you arrive home.”

By Sente user standards, I am far from a power user: the biggest databases seem to be about 10 times mine. This could be an improvement from Zotero, where my library speed can’t quite keep up some days. I’d be *very* interested to hear from enthusiastic Sente users. Switching seems quite feasible, and probably worth checking out their iPad app.

The main obvious concerns I have are about notetaking and portability. Notetaking of offline/non-fulltext items is important but doesn’t seem to have been a particular focus of development. Portability is incredibly important: I need to ensure that export (and ideally import) brings along files and notes as well as PDFs.

I’ve been thinking of direct, in-file PDF annotation as the best possible way to ensure that my annotations outlive my reference manager. Should I rethink that? So far (according to their draft manual as above): “Highlighting created in Sente 6.2 is not stored in the PDF itself — it is stored in the library database. This change has several very positive effects, notably on syncing.” Let me know what you think in the comments!

Tags: , , ,
Posted in books and reading, library and information science, reviews, scholarly communication | Comments (2)

Extended deadline for STLR 2011

April 29th, 2011

We’ve extended the STLR 2011 deadline due to several requests; submissions are now due May 8th.

JCDL workshops are split over two half-days, and we are lucky enough to have *two* keynote speakers: Bernhard Haslhofer of the University of Vienna and Cathy Marshall of Microsoft Research.

Consider submitting!

CALL FOR PARTICIPATION
The 1st Workshop on Semantic Web Technologies for Libraries and Readers

STLR 2011

June 16 (PM) & 17 (AM) 2011

http://stlr2011.weebly.com/
Co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011 Ottawa, Canada

While Semantic Web technologies are successfully being applied to library catalogs and digital libraries, the semantic enhancement of books and other electronic media is ripe for further exploration. Connections between envisioned and emerging scholarly objects (which are doubtless social and semantic) and the digital libraries in which these items will be housed, encountered, and explored have yet to be made and implemented. Likewise, mobile reading brings new opportunities for personalized, context-aware interactions between reader and material, enriched by information such as location, time of day and access history.

This full-day workshop, motivated by the idea that reading is mobile, interactive, social, and material, will be focused on semantically enhancing electronic media as well as on the mobile and social aspects of the Semantic Web for electronic media, libraries and their users. It aims to bring together practitioners and developers involved in semantically enhancing electronic media (including documents, books, research objects, multimedia materials and digital libraries) as well as academics researching more formal aspects of the interactions between such resources and their users. We also particularly invite entrepreneurs and developers interested in enhancing electronic media using Semantic Web technologies with a user-centered approach.

We invite the submission of papers, demonstrations and posters which describe implementations or original research that are related (but are not limited) to the following areas of interest:

  • Strategies for semantic publishing (technical, social, and economic)
  • Approaches for consuming semantic representations of digital documents and electronic media
  • Open and shared semantic bookmarks and annotations for mobile and device-independent use
  • User-centered approaches for semantically annotating reading lists and/or library catalogues
  • Applications of Semantic Web technologies for building personal or context-aware media libraries
  • Approaches for interacting with context-aware electronic media (e.g. location-aware storytelling, context-sensitive mobile applications, use of geolocation, personalization, etc.)
  • Applications for media recommendations and filtering using Semantic Web technologies
  • Applications integrating natural language processing with approaches for semantic annotation of reading materials
  • Applications leveraging the interoperability of semantic annotations for aggregation and crowd-sourcing
  • Approaches for discipline-specific or task-specific information sharing and collaboration
  • Social semantic approaches for using, publishing, and filtering scholarly objects and personal electronic media

IMPORTANT DATES

*EXTENDED* Paper submission deadline: May 8th 2011
Acceptance notification: June 1st 2011
Camera-ready version: June 8th 2011

KEYNOTE SPEAKERS

PROGRAM COMMITTEE

Each submission will be independently reviewed by 2-3 program committee members.

ORGANIZING COMMITTEE

  • Alison Callahan, Dept of Biology, Carleton University, Ottawa, Canada
  • Dr. Michel Dumontier, Dept of Biology, Carleton University, Ottawa, Canada
  • Jodi Schneider, DERI, NUI Galway, Ireland
  • Dr. Lars Svensson, German National Library

SUBMISSION INSTRUCTIONS

Please use PDF format for all submissions. Semantically annotated versions of submissions, and submissions in novel digital formats, are encouraged and will be accepted in addition to a PDF version.

All submissions must adhere to the following page limits:
Full length papers: maximum 8 pages
Demonstrations: 2 pages
Posters: 1 page

Use the ACM template for formatting: http://www.acm.org/sigs/pubs/proceed/template.html

Submit using EasyChair: https://www.easychair.org/conferences/?conf=stlr2011

Tags: , , , , , , , , , , , , ,
Posted in future of publishing, library and information science, PhD diary, scholarly communication, semantic web, social semantic web | Comments (2)

Supporting Reading

January 21st, 2011

Yesterday I spoke at Beyond the PDF about use cases for reading. Slides are below; the presentation was also webcast, so I hope to share a video recording when it becomes available. The video is now on Youtube (part of the Beyond the PDF video playlist) and below.

Thanks to the DERI Social Software Unit for feedback on an earlier version of this presentation. I’m particularly grateful to Allen Renear and Carole Palmer from UIUC, whose call for ontology-aware reading tools pushed me down this path, and to Geoffrey Bilder who presented these ideas in a way I couldn’t help thinking about and remixing. Cathy Marshall’s clear exposition, in Reading and Writing the Electronic Book was fundamental to digging deeper.

Tags: ,
Posted in books and reading, future of publishing, library and information science, scholarly communication, social semantic web | Comments (2)

Wanted: the ultimate mobile app for scholarly ereading

January 7th, 2011

Nicole Henning suggests that academic libraries and scholarly presses work together to create the ultimate mobile app for scholarly ereading. I think about the requirements a bit differently, in terms of the functional requirements.

The main functions are obtaining materials, reading them, organizing them, keeping them, and sharing them.

For obtaining materials, the key new requirement is to simplify authentication: handle campus authentication systems and personal subscriptions. Multiple credentialed identities should be supported. A secondary consideration is that RSS feeds (e.g. for journal tables of contents) should be supported.

For reading materials, the key requirement is to support multiple formats in the same application. I don’t know of a web app or mobile app that supports PDF, EPUB, and HTML. Reading interfaces matter: look to Stanza and Ibis Reader for best-in-class examples.

For organizing materials, the key is synergy between the user’s data and existing data. Allow tags, folders, and multiple collections. But also leverage existing publisher and library metadata. Keep it flexible, allowing the user to modify metadata for personal use (e.g. for consistency or personal terminology) and to optionally submit corrections.

For keeping materials, import, export, and sync content from the user’s chosen cloud-based storage and WebDAV servers. No other device (e.g. laptop or desktop) should be needed.

For sharing materials, support lightweight micropublishing on social networks and email; networks should be extensible and user-customizable. Sync to or integrate with citation managers and social cataloging/reading list management systems.

Regardless of the ultimate system, I’d stress that device independence is important, meaning that an HTML5 website would probably the place to start: look to Ibis Reader as a model.

Tags: , ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, scholarly communication | Comments (5)

Searching for LaTeX code (Springer only)

January 6th, 2011

Springer’s LaTeX search service (example results) allow searching for LaTeX strings or finding the LaTeX equations in an article. Since LaTeX is used to markup equations in many scientific publications this could be an interesting way to find related work or view an equation-centric summary of a paper.

You can provide a LaTeX string, and Springer says that besides exact matches they can return similar LaTeX strings:
exact matches to a LaTeX search

Or, you can search by DOI or title to get all the equations in a given publication:
results for a particular title

Under each equation in the search results you can click “show LaTeX code”:
show the LaTeX code for an equation
Right now it just searches Springer’s publications; Springer would like to add open access databases and preprint servers. Coverage even in Springer journals seems spotty: I couldn’t find two particular discrete math articles papers, so I’ve written Springer for clarification. As far as I can tell, there’s no way to get from SpringerLink to this LaTeX search yet: it’s a shame, because “show all equations in this article” would be useful, even with the proviso that only LaTeX equations were shown.

A nice touch is their sandbox where you can test LaTeX code, with a LaTeX dictionary conveniently below.

via Eric Hellman

Tags: , , , ,
Posted in future of publishing, information ecosystem, library and information science, math, scholarly communication | Comments (1)

Happy Public Domain Day!

January 2nd, 2011

Today, in many countries around the world, new works become public property: January 1st every year is Public Domain Day. Material in the public domain can be used, remixed and shared freely — without violating copyright and without asking permission.

However, in the United States, not a single new work entered the public domain today. Americans must wait 8 more years: Under United States copyright law, nothing more will be added to the public domain until January 1, 2019.

Until the 1970′s the maximum copyright term was 56 years. Under that law, Americans would have been able to truly celebrate Public Domain Day:

  1. All works published in 1954 would be entering the public domain today.
  2. up to 85% of all copyrighted works from 1982 would be entering the public domain today. (Copyright Office and Duke).

Instead, only works published before 1923 are conclusively in the public domain in the U.S. today. What about post-1923 publications? It’s complicated: in the United States1.

For more information on Public Domain Day and the United States, Duke’s Center for the Study of the Public Domain has a series of useful pages.

  1. 609 pages worth of complicated []

Tags: , ,
Posted in books and reading, information ecosystem, intellectual freedom, library and information science | Comments (0)

Let’s link the world’s metadata!

December 9th, 2010

Together we can continue building a global metadata infrastructure. I am tasking you with helping. How can you do that?

For evangelists, practitioners, and consultants:

  • Thanks for bringing Linked Data to where it is today! We’re counting on you for even more yummy cross-disciplinary Linked Data!
  • What tools and applications are most urgently needed? Researchers and developers need to hear your use cases: please partner with them to share these needs!
  • How do you and your clients choose [terms, concepts, schemas, ontologies]? What helps the most?
  • Overall, what is working (and what is not)? How can we amplify what *is* working?

For Semantic Web researchers:

  • Build out the trust and provenance infrastructure.
  • Mature the query languages (e.g. SPARQL) [perhaps someone could say more about what this would mean?]
  • Building tools and applications for end-users is really important: value this work, and get to know some real usecases and end-users!

For information scientists:

  • How can we identify ‘universals’ across languages, disciplines, and cultures? Does the Colon classification help?
  • What are the best practices for sharing and reusing [terms, concepts, schemas, ontologies]? What is working and what is failing with metadata registries? What are the alternatives?

For managers, project leaders, and business people:

  • How do we create and justify the business case for Terminology services [like MIME types, library subject headings, New York Times Topics]?
  • Please collect and share your usage data! Do we need infrastructure for sharing usage data?
  • Share the economic and business successes of Linked Data!

That ends the call to action, but here’s where it comes from.

Yesterday Stuart Weibel gave a talk called ”Missing Pieces in the Global Metadata Landscape” [slideshare] at InfoCom International Symposium in Tokyo. Stu asked 11 of us what those missing pieces were—with 3 questions: the conceptual issues, organizational impediments, and the most important overall issue. This last question, “What is the most important missing infrastructural link in establishing globally interoperable metadata systems?”, is my favorite, so I’ll talk about it a little further.

Stu summarizes that the infrastructure is mostly there, but that broad adoption (of standards, conventions, and common practice) is key. Overall these are the key issues he reports:

  • Tools to support and encourage the reuse of terms, concepts, schemas, ontologies (e.g., metadata registries, and more)
  • Widespread, cross-disciplinary adoption of a common metadata approach (Linked Data)
  • Query languages for the open web (SPARQL) are not fully mature
  • Trust and provenance infrastructure
  • Nothing’s missing… just use RDF, Linked Data, and the open web.  The key is broad adoption, and that requires better tools and applications. It’s a social problem, not a technical problem.
  • The ability to identify ‘universals’ across languages, disciplines, and cultures – revive Ranganathan’s facets?
  • Terminology services [like MIME types, library subject headings, New York Times Topics] have long been proposed as important services, but they are expensive to create, curate, and manage, and the economic models are weak
  • Stuff that does not work is often obvious. We need usage data to see what does work, and amplify it

You may notice, now, that the “call” looks a little familiar!

Tags: , , ,
Posted in information ecosystem, library and information science, semantic web | Comments (0)

Utopia Documents: pulling scientific data into the PDF for interactive exploration

November 14th, 2010

What if data were accessible within the document itself?

Utopia Documents is a free PDF viewer which recognizes certain enhanced figures, and fetches the underlying data. This allows readers to view and experiment with the tables, graphs, molecular structures, and sequences in situ.


You can download Utopia Documents for Mac and Windows to view enhanced papers, such as those published in The Semantic Biochemical Journal.

These screencasts were made from pages 9 and 10 of PDF of a paper by the Manchester-based Utopia team: T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, Dec 2009. doi:10.1042/BJ20091474.

In an interview at the Guardian, Utopia’s Phillip McDermott says:

“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.

“Semantics, loose-coupling, fingerprinting and linked-data are the key ingredients. All the data is described using ontologies, and a plug-in system allows third parties to integrate their database or tool within a few lines of script. We use fingerprinting to allow us to recognise what paper a user is reading, and to spot duplicates. All annotations are held remotely, so that wherever you view a paper, the result is the same.”

I’d still like to see a demo of the commenting functionality.

I’d also be particularly interested in the publisher perspective, about the production work that goes into creating the enhancements. Portland Press’s October news announces that they’ve been promoting Utopia at the Charleston conference and SSP, with an upcoming appearance at the STM Innovations Seminar.

Utopia came to my attention via Steve Pettifer’s mention.

Tags: , , , , , , , , ,
Posted in future of publishing, information ecosystem, library and information science, scholarly communication, semantic web, social semantic web | Comments (4)

A Model-View-Controller perspective of scholarly articles

November 13th, 2010

A scholarly paper is not a PDF. A PDF is merely one view of a scholarly paper. To push ‘beyond the PDF’, we need design patterns that allow us to segregate the user interface of the paper (whether it is displayed as an aggregation of triples, a list of assertions, a PDF, an ePub, HTML, …) from the thing itself.

Towards this end, Steve Pettifer has a Model-View-Controller perspective on scholarly articles, which he shared in a post on the Beyond the PDF listserv, where discussions are leading up to a workshop in January. I am awe-struck: I wish I’d thought of this way of separating the structure and explaining it.

I think a lot of the disagreement about the role of the PDF can be put down to trying to overload its function: to try to imbue it with the qualities of both ‘model’ and ‘view’. … One of the things that software architects (and I suspect designers in general) have learned over the years is that if you try to give something functions that it shouldn’t have, you end up with a mess; if you can separate out the concerns, you get a much more elegant and robust solution.

My personal take on this is that we should keep these things very separate, and that if we do this, then many of the problems we’ve been discussing become more clearly defined (and I hope, many of the apparent contradictions, resolved).

So… a PDF (or come to that, an e-book version or a html page) is merely a *view* of an article. The article itself (the ‘model’) is a completely different (and perhaps more abstract) thing. Views can be tailored for a particular purpose, whether that’s for machine processing, human reading, human browsing, etc etc.

[paragraph break inserted]

The relationship between the views and their underlying model is managed by the concept of a ‘controller’. For example, if we represent an article’s model in XML or RDF (its text, illustrations, association nanopublications, annotations and whatever else we like), then that model can be transformed in to any number of views. In the case of converting XML into human-readable XHTML, there are many stable and mature technologies (XSLT etc). In the case of doing the same with PDF, the traditional controller is something that generates PDFs.

[paragraph break inserted]

The thing that’s been (somewhat) lacking so far is the two-way communication between view and model (via controller) that’s necessary to prevent the views from ossifying and becoming out of date (i.e. there’s no easy way to see that comments have been added to the HTML version of an article’s view if you happen to be reading the PDF version, so the view here can rapidly diverge from its underlying model).

[paragraph break inserted, link added]

Our Utopia software is an attempt to provide this two-way controller for PDFs. I believe that once you have this bidirectional relationship between view and model, then the actual detailed affordances of the individual views (i.e. what can a PDF do well / badly, what can HTML do well / badly) become less important. They are all merely means to channeling the content of an article to its destination (whether that’s human or machine).

The good thing about having this ‘model view controller’ take on the problem is that only the model needs to be pinned down completely …

Perhaps separating out our concerns in this way — that is, treating the PDF as one possible representation of an article — might help focus our criticisms of the current state of affairs? I fear at the moment we are conflating the issues to some degree.

- Steve Pettifer in a Beyond the PDF listserv post

I’m particularly interested in hearing if this perspective, using the MVC model, makes sense to others.

Tags: , , , , , , ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, scholarly communication, social semantic web | Comments (9)