Annotation summaries: standardization needed

August 4th, 2011

I’m finding an iPad amazing for reading PDFs — it’s like instant printing, with no weight to carry around (heavy, and they get wet). And with software like iAnnotatePDF and GoodReader, I can annotate with just a bit more effort than while using pen and paper.

iAnnotate (video review) is the killer app that convinced me to buy an iPad. But it has a killer flaw: I couldn’t keep my reading organized with it.

Hence I started looking into reference managers that would work well on the iPad–allowing annotation, making it easy to keep PDF’s organized, and ensuring that annotations were kept in a sensible place.

Sente fulfills many of my requirements. Sync seems to work effortlessly — well exceeding my experience with other products. The annotation process is reasonably smooth but so far I haven’t found a way to export annotations directly.

This is a bit problematic because PDF editors don’t seem to play nice with each others’ annotations. For instance, iAnnotate and GoodReader both export annotations for their own software. You get something very useful and readable like this:

Page 1, Highlight (Yellow):
Content: “The scientific use of Twitter has received some attention in previous work: [4] and [5] have performed several automatic analyses of tweets collected for different conference hashtags, including for example time series and lists of most active twitterers. [3] and [9] have furthermore carried out manual analyses of tweet contents for conference tweet datasets to determine, what conference participants are tweeting about. [10] are develop ing automatic methods for extracting semantic information from conference tweets. [6] have focused on tweets published by a set of manually identified scientists and have investigated their citation behavior.”

Page 1, Highlight (Yellow):
Content: “citations and references are two sides of the same coin.”

But when you annotate in one program and get notes from another program, things get messier.

For PDFs annotated externally, iAnnotate lists highlights without only grabs text from the notes, like this:

Page 1, Highlight (Custom Color: #fdf7bc):

Page 2, Highlight (Custom Color: #fdf7bc):

Page 2, Note (Custom Color: #fdffaa):
Not sure why this stands out from other lists by individuals.

GoodReader plays a bit nicer with annotations from other programs: it breaks annotations made by other programs at line boundaries. This makes summaries a little difficult to read, but at least there’s some content:

Highlight (color #FDF7BC):
first of all it will have to start with the general problem in

Highlight (color #FDF7BC):
analyzing scientific impact of Twitter:

Highlight (color #FDF7BC):
[6] define

Highlight (color #FDF7BC):
tweet to a peer-U

I’m currently checking into the standardization around annotations summaries.

I’d be very interested to hear about how you detect metadata and annotation differences in PDFs. As examples, I’ve marked up a recent WebSci poster, with some annotations from GoodReader, from iAnnotatePDF, and from Sente.

Sente, a first look

August 1st, 2011

Today I’ve been testing out Sente, on the theory that it might help me organize the PDFs I’m annotating on my iPad.

The desktop application is geared to Mac users who really care about bibliographies, with several fantastic features, including

I like Sente’s statuses; read/unread and Recently Modified and Recently Added are automatically tracked, and you can rate items. I especially like the workflow statuses, which match some of my common tasks:

  • Get Full Text
  • Discuss Further
  • Cite
  • Do Not Cite

“Sort by citation” is surprisingly illuminating: I didn’t realize how many papers from “Discourse Studies” I’d been looking at recently.

Another great feature that could be easily and fruitfully added to most other bibliographic managers: title case and exact case lists (I am *so* sick of seeing lowercased ‘wikipedia’ in bibliographies!), which you can very easily customize.
Sente also has a journal dictionary: You can assign the abbreviations and ISSNs (authority control, yippee!)!

Their visual display could use an update (thankfully it’s on the way) and I find their icons confusing (maybe ‘pencil’ for ‘note’ is sensible, but what in the world about ‘four dots in a diamond shape’ says ‘abstract’ to you?)

I tested the Zotero import. As I wrote Sente’s developers, there are some issues:

In testing it out on my large (5000+ item) Zotero library I see that:

  1. HTML attachments are not copied into the Sente library
  2. Image attachments are not copied into the Sente library
  3. Text note attachments are not copied into the Sente library
  4. Subcollections are not preserved

Since then, I’ve noticed that the keywords don’t get imported. Further, the date added and “date modified” fields are not preserved, but instead now reflect the import date and time (as I noted on twitter). But I do like their duplicate detection. Along with promising to consolidate matched items, they provide a report about the discarded matches. For instance:

Rule “DOI rule” flagged these two references as possible duplicates:
Vilar, P., & Žumer, M. (2008). Perceptions and importance of user friendliness of IR systems according to users’ individual characteristics and academic discipline. Journal of the American Society for Information Science & Technology, 59(12), 1995-2007. doi:Article
Quick-Response Barcodes. (2008). Library Technology Reports, 44(5), 46-47. doi:Article
However, the match was rejected because the references differ in: Article Title, pages, Publication Title, URL, Volume, Issue.

I have played briefly with the Sente’s free iPad viewer, but not yet with their paid ($19.99) app which allows annotation. Based on reviews (why no permalinks, Apple?), “Export seems to be an option but crucially, import is not.” However, if Sente’s annotation is enough, there’s hope, since documentation of the Sync functionality already in the current (6.2) version the description of Sync for the planned 6.5 release (via this) is *very* promising: “As you read a PDF on your iPad on the bus ride home, highlighting passages and taking notes, the highlighting and notes appear in all copies by the time you arrive home.”

By Sente user standards, I am far from a power user: the biggest databases seem to be about 10 times mine. This could be an improvement from Zotero, where my library speed can’t quite keep up some days. I’d be *very* interested to hear from enthusiastic Sente users. Switching seems quite feasible, and probably worth checking out their iPad app.

The main obvious concerns I have are about notetaking and portability. Notetaking of offline/non-fulltext items is important but doesn’t seem to have been a particular focus of development. Portability is incredibly important: I need to ensure that export (and ideally import) brings along files and notes as well as PDFs.

I’ve been thinking of direct, in-file PDF annotation as the best possible way to ensure that my annotations outlive my reference manager. Should I rethink that? So far (according to their draft manual as above): “Highlighting created in Sente 6.2 is not stored in the PDF itself — it is stored in the library database. This change has several very positive effects, notably on syncing.” Let me know what you think in the comments!

