Archive for August, 2011

Understanding Wikipedia through the evolution of a single page

August 26th, 2011

“The only constant is change.” – Heraclitis

How well do you know Wikipedia? Get to know it a little better by looking at how your favorite article changes over time. To inspire you, here are two examples.

Jon Udell’s screencast about ‘Heavy Metal Umlaut’ is a classic, looking back (in 2005) at the first two years of that article. It points out the accumulation of information, vandalism (and its swift reversion), formatting changes, and issues around the verifiability of facts.

In a recent article for the Awl1, Emily Morris sifts through 2,303 edits of ‘Lolita’ to pull out nitpicking revision comments, interesting diffs, and statistics.

  1. The Awl is *woefully* distracting. I urge you not to follow any links. (Thanks a lot Louis!) []

Tags: ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, social web | Comments (0)

Forking conversations, forking documents

August 7th, 2011

When the topic of discussion changes, how do you indicate that? Tender Support seems clunky in some ways, but their forking mechanism helps conversations stay focused on their topic:

Forking with Tender Support

Lately forking has also been on my mind as the Library Linked Data group edits and reorganizes our draft report: wiki history and version control is helpful, but insufficient. What I miss most is a “fork” feature, where you could temporarily take ownership of a copy (socially, this indicates that something is a possibility, rather than the consensus; technically, it indicates provenance, would allow “show all forks of this”, and might help in merge changes back). Perhaps naming and tagging particular history items in MediaWiki could help address this, but I think really I want something like git.

I’ve seen a few examples of writing and editing prose with git; I’d like to get a better understanding of the best practices for making collaborative changes in texts with distributed version control systems. Surely somebody’s written up manuals on this?

Tags: , , , , , ,
Posted in argumentative discussions, library and information science, PhD diary, random thoughts | Comments (2)

Annotation summaries: standardization needed

August 4th, 2011

I’m finding an iPad amazing for reading PDFs — it’s like instant printing, with no weight to carry around (heavy, and they get wet). And with software like iAnnotatePDF and GoodReader, I can annotate with just a bit more effort than while using pen and paper.

iAnnotate (video review) is the killer app that convinced me to buy an iPad. But it has a killer flaw: I couldn’t keep my reading organized with it.

Hence I started looking into reference managers that would work well on the iPad–allowing annotation, making it easy to keep PDF’s organized, and ensuring that annotations were kept in a sensible place.

Sente fulfills many of my requirements. Sync seems to work effortlessly — well exceeding my experience with other products. The annotation process is reasonably smooth but so far I haven’t found a way to export annotations directly.

This is a bit problematic because PDF editors don’t seem to play nice with each others’ annotations. For instance, iAnnotate and GoodReader both export annotations for their own software. You get something very useful and readable like this:

Page 1, Highlight (Yellow):
Content: “The scientific use of Twitter has received some attention in previous work: [4] and [5] have performed several automatic analyses of tweets collected for different conference hashtags, including for example time series and lists of most active twitterers. [3] and [9] have furthermore carried out manual analyses of tweet contents for conference tweet datasets to determine, what conference participants are tweeting about. [10] are develop ing automatic methods for extracting semantic information from conference tweets. [6] have focused on tweets published by a set of manually identified scientists and have investigated their citation behavior.”

Page 1, Highlight (Yellow):
Content: “citations and references are two sides of the same coin.”

But when you annotate in one program and get notes from another program, things get messier.

For PDFs annotated externally, iAnnotate lists highlights without only grabs text from the notes, like this:

Page 1, Highlight (Custom Color: #fdf7bc):

Page 2, Highlight (Custom Color: #fdf7bc):

Page 2, Note (Custom Color: #fdffaa):
Not sure why this stands out from other lists by individuals.

GoodReader plays a bit nicer with annotations from other programs: it breaks annotations made by other programs at line boundaries. This makes summaries a little difficult to read, but at least there’s some content:

Highlight (color #FDF7BC):
first of all it will have to start with the general problem in

Highlight (color #FDF7BC):
analyzing scientific impact of Twitter:

Highlight (color #FDF7BC):
[6] define

Highlight (color #FDF7BC):
tweet to a peer-U

I’m currently checking into the standardization around annotations summaries.

I’d be very interested to hear about how you detect metadata and annotation differences in PDFs. As examples, I’ve marked up a recent WebSci poster, with some annotations from GoodReader, from iAnnotatePDF, and from Sente.

Tags: , , ,
Posted in books and reading, iOS: iPad, iPhone, etc. | Comments (1)

091labs again!

August 4th, 2011

Yesterday, our local hackerspace/makerspace re-opened!

For awhile now, Fiacre O’Duinn has been talking about the shared purpose between libraries and these spaces:

The ideas that fuel hackerspaces, such as cooperation, resource and information sharing, self-directed education, and a diversity of views are concepts that are central to our profession’s ethos.

Not to mention the cool tech (3D printers, laser engravers, tool lending libraries, …) we’d like to see in libraries in the not-so-distant future.

It’s a conversation I hope to pick up with Willow & others (thx!)at CCCamp.

Tags: , , , ,
Posted in library and information science, random thoughts | Comments (0)

Citation management means different things to different people

August 3rd, 2011

I got to talking with a mathematician friend about citation management. We came to the conclusion that “manage PDFs” is my primary goal while “get out good citations” is his primary goal. I thought it would interesting to look at his requirements.

His ideal program would

  1. Organize the PDFs (Papers does this, when it doesn’t botch the author names and the title) preferably in the file system, so I can use Dropbox
  2. Get BibTeX entires from MathSciNet, ACM, etc. EXACTLY AS THEY ARE
  3. Have some decent way to organize notes by “project” or something

He doesn’t care about:

  1. Typing \cite
  2. A “unified” bibliographic database
  3. Social bibliographies (though I am not against them; it is just not a burning issue)

He says:

I guess the point is that, if I am writing something and I know I want to cite it, and I know there is a “official” BibTeX for it, I just need a way to get that more quickly than:

  1. Type the URL
  2. Click on “Proxy this” in my bookmarks bar
  3. Search for the paper
  4. Copy/paste the BibTeX
  5. Edit the cite key to something mnemonic

He followed up with an example of the “awful” awful, lossy markup Papers produces which loses information including the ISSN and DOI; he prefers the minimalist BibTeX. (oops!; he adds “I understated how bad papers is. The real papers entry (top) not only has screwy names, but junk instead of the full journal name. The papers cite key is meaningless noise too (but mathscinet is meaningful noise).”) To get around this, he does the same search/download “a million times”.

AMS Papers2 BibTeX:
author = {L Asimow and B Roth},
journal = {Trans. Amer. Math. Soc.},
title = {The rigidity of graphs},
pages = {279--289},
volume = {245},
year = {1978},

Papers' The AMS version of the same BibTeX:
@article {AR78,
    AUTHOR = {Asimow, L. and Roth, B.},
     TITLE = {The rigidity of graphs},
   JOURNAL = {Trans. Amer. Math. Soc.},
  FJOURNAL = {Transactions of the American Mathematical Society},
    VOLUME = {245},
      YEAR = {1978},
     PAGES = {279--289},
      ISSN = {0002-9947},
     CODEN = {TAMTAM},
   MRCLASS = {57M15 (05C10 52A40 53B50 73K05)},
  MRNUMBER = {511410 (80i:57004a)},
MRREVIEWER = {G. Laman},
       DOI = {10.2307/1998867},
       URL = {},

I’ve just discovered that BibDesk‘s1 ‘minimize’ does what he wants: its has output is quite close to the AMS Papers2 version:

	Author = {Asimow, L. and Roth, B.},
	Journal = {Trans. Amer. Math. Soc.},
	Pages = {279--289},
	Title = {The rigidity of graphs},
	Volume = {245},
	Year = {1978}}

I’d still like to understand the impact the non-minimal BibTeX is having; could be bad citation styles are causing part of the problem.

While we have different needs for citation management, we’re both annoyed by the default filenames many publishers use – like fulltext.pdf and sdarticle.pdf. But I’ll tolerate these, as long as I can get to it from a database index with a nice frontend.

We of course moved on to discussing how research needs an iTunes or, as Geoff Bilder has called it, an iPapers.

This blog post brought to you by Google chat and the number 3.

  1. See also A short review of BibDesk from MacResearch []

Tags: , , , , ,
Posted in books and reading, information ecosystem, library and information science, scholarly communication | Comments (0)

Sente, a first look

August 1st, 2011

Today I’ve been testing out Sente, on the theory that it might help me organize the PDFs I’m annotating on my iPad.

The desktop application is geared to Mac users who really care about bibliographies, with several fantastic features, including

I like Sente’s statuses; read/unread and Recently Modified and Recently Added are automatically tracked, and you can rate items. I especially like the workflow statuses, which match some of my common tasks:

  • Get Full Text
  • Discuss Further
  • Cite
  • Do Not Cite

“Sort by citation” is surprisingly illuminating: I didn’t realize how many papers from “Discourse Studies” I’d been looking at recently.

Another great feature that could be easily and fruitfully added to most other bibliographic managers: title case and exact case lists (I am *so* sick of seeing lowercased ‘wikipedia’ in bibliographies!), which you can very easily customize.
Sente also has a journal dictionary: You can assign the abbreviations and ISSNs (authority control, yippee!)!

Their visual display could use an update (thankfully it’s on the way) and I find their icons confusing (maybe ‘pencil’ for ‘note’ is sensible, but what in the world about ‘four dots in a diamond shape’ says ‘abstract’ to you?)

I tested the Zotero import. As I wrote Sente’s developers, there are some issues:

In testing it out on my large (5000+ item) Zotero library I see that:

  1. HTML attachments are not copied into the Sente library
  2. Image attachments are not copied into the Sente library
  3. Text note attachments are not copied into the Sente library
  4. Subcollections are not preserved

Since then, I’ve noticed that the keywords don’t get imported. Further, the date added and “date modified” fields are not preserved, but instead now reflect the import date and time (as I noted on twitter). But I do like their duplicate detection. Along with promising to consolidate matched items, they provide a report about the discarded matches. For instance:

Rule “DOI rule” flagged these two references as possible duplicates:
Vilar, P., & Žumer, M. (2008). Perceptions and importance of user friendliness of IR systems according to users’ individual characteristics and academic discipline. Journal of the American Society for Information Science & Technology, 59(12), 1995-2007. doi:Article
Quick-Response Barcodes. (2008). Library Technology Reports, 44(5), 46-47. doi:Article
However, the match was rejected because the references differ in: Article Title, pages, Publication Title, URL, Volume, Issue.

I have played briefly with the Sente’s free iPad viewer, but not yet with their paid ($19.99) app which allows annotation. Based on reviews (why no permalinks, Apple?), “Export seems to be an option but crucially, import is not.” However, if Sente’s annotation is enough, there’s hope, since documentation of the Sync functionality already in the current (6.2) version the description of Sync for the planned 6.5 release (via this) is *very* promising: “As you read a PDF on your iPad on the bus ride home, highlighting passages and taking notes, the highlighting and notes appear in all copies by the time you arrive home.”

By Sente user standards, I am far from a power user: the biggest databases seem to be about 10 times mine. This could be an improvement from Zotero, where my library speed can’t quite keep up some days. I’d be *very* interested to hear from enthusiastic Sente users. Switching seems quite feasible, and probably worth checking out their iPad app.

The main obvious concerns I have are about notetaking and portability. Notetaking of offline/non-fulltext items is important but doesn’t seem to have been a particular focus of development. Portability is incredibly important: I need to ensure that export (and ideally import) brings along files and notes as well as PDFs.

I’ve been thinking of direct, in-file PDF annotation as the best possible way to ensure that my annotations outlive my reference manager. Should I rethink that? So far (according to their draft manual as above): “Highlighting created in Sente 6.2 is not stored in the PDF itself — it is stored in the library database. This change has several very positive effects, notably on syncing.” Let me know what you think in the comments!

Tags: , , ,
Posted in books and reading, library and information science, reviews, scholarly communication | Comments (2)