Readers and researchers were annotating texts long before the invention of the printing press. While annotating texts has been relatively easy for centuries thanks to the margins of paper texts, annotating digital items remains difficult. This is an odd quirk of digital content distribution, since the potential for capturing and sharing annotations in a digital environment make notations potentially so much more valuable.

Thinking back to the foundation of the World Wide Web, annotation was actually a critical component of what Sir Tim Berners-Lee conceived of as an interconnected store of research documents for CERN. In fact, one of the examples in Berners-Lee’s 1989 paper describing the World Wide Web — “Mesh” as he termed it at the time; the term “Web” wouldn’t come until a few years later — was an annotation about a comment related to a paper by Doug Thompson. In his chart of how documents relate, Berners-Lee described objects that refer to other objects, some of which are comments about other documents. Later in the paper, one of the “clear practical requirements” of the new system would be that, “[o]ne must also be able to annotate links, as well as nodes, privately.”

Annotation was almost an important feature of the first widely distributed web browser, Mosaic, developed by Marc Andreessen and Eric Bina who worked at the time at the University of Illinois National Center for Supercomputing Applications. Andreessen notes this in a blog post about his investment in Rap Genius, an annotation service for rap lyrics (which has much larger ambitions). In a 1993 email to the www-talk list, Andreessen asked if anyone was willing to alpha test the annotation features of Mosaic v1.1. The Web-browsing client had support for “very simple” annotation. On one of the Rap Genius post annotations, Andreessen noted that the challenge at the time was the need to scale the server support to handle the annotations. At the time, Andreessen and Bina approached the National Science Foundation (NSF) to seek further development of Mosaic and the annotation features, but, according to Andreesson, NSF “decided the project had no justifiable technical merit.” When Andreesson and Jim Clarke took Mosaic and commercialized it as Netscape, annotation fell off the development roadmap and was relegated to the back-burner of nice-to-have Web services. A few start-ups tried to push forward the annotation service, notably Third Voice, which failed in 2001 after criticism it was “defacing websites.” Another notable service for posting and sharing notes on websites was Fleck.com, which despite substantial angel funding, a patent, and some positive publicity, closed up shop in 2008, with its domain sold in 2010. But work and tools continued to be developed. Right now, there are some 17 web annotation services noted on Wikipedia, but several are missing, and it contains a few services that, strictly speaking, aren’t online annotation systems, but rather are bookmarking services.

A few weeks ago, more than 100 technologists interested in digital annotation gathered in San Francisco for the iAnnotate meeting, organized by Dan Whaley at hypothes.is with support from the Andrew W. Mellon Foundation. The meeting provided an opportunity for those interested in digital annotation to discuss technical interoperability, how various services are working, provide demos of new services, and discuss how annotation can be used in different contexts. One particular topic of interest has been the W3C Open Annotation Collaboration work. That group has produced a variety of interesting pilots, an Open Annotation Data Model, many publications and several other advances in annotation systems. Many of the new service providers talked about their services, including Domeo, Maphub, Pelagios,  Authorea, dotdotdot, and the iAnnotate host hypothes.is who recently launched an alpha version of their software. Several of the other speakers touched on annotation of data sets and annotations in scholarly papers. The meeting included a variety of break-out sessions and discussion groups on a variety of topics. Much of this was captured in notes and videos, which are (or will be) posted to the iAnnotate agenda page.

There are significant challenges to digital annotation systems, particularly shared annotation systems. While server space was an issue with Mosaic in the 1990s, this isn’t the critical problem any longer, although at scale an annotation system does require some significant hardware support. The real problem associated with public sharing of annotations is getting the model to work across different devices and systems. Locating a reference point is also a particular challenge when working  with reflowable text. In such a context, referring to page 164 doesn’t mean anything, because page numbering often isn’t used and, even if it were, one could size up the font to such a large scale that only three words might appear on a “page”. Similarly, one can’t rely on specific character count within the file, since that could change with minor editorial corrections. Creating a hash string of text characters before and after a reference point also has problems, since text could be repeated (as in song lyrics, or a recurring dream sequence). Also matching the point in a text between the first edition of a book and its annotated, fourth version, for example, creates tremendous matching problems, since there isn’t a widely adopted work identifier to tie together various manifestations of a work. This problem expands as one moves away from simply annotating text to comparing different media expressions. Those who remember showing up at a literature class with a different edition of a book than the teacher was using can relate to these challenges. NISO has a working group that is exploring standards to address the location problem, as a jumping off point for work on this topic, which we hope will feed into the IDPF EPUB3 specification.

Other non-technical problems arise that need to be addressed, such as copyright concerns and privacy. If an annotation system allows the capture of a selection of the referenced text, one might be able to collect all of those disparate snippets to recreate the work in its entirety. Realistically, this is among the least likely of pirate scenarios, but some publishers who have engaged in these discussions have noted it with concern.

The question of sharing is also fraught with complexity. How can one choose to share their annotations selectively — only for some works but not others, or only with some people and not others, or only in certain situations or circumstances? A book club may only want to share their annotations of the work they are discussing with each other. Similarly, users of the system trust their annotations won’t disappear if a service is sold or goes out of business. There currently isn’t a standard format for annotation exports and imports that could be used in such a scenario. As with all online services, the traditional problems of user identification and ID management is another challenge, but one that like the rest of the internet is waiting for a better ID management service.

For all the potential problems, the opportunity that exists within this one aspect of scholarly publishing to advance understanding of science is vast. John Perry Barlow, who introduced the second day of the iAnnotate meeting (he begins speaking about 2:40 into video), said it well, describing annotations as a critical element of the process by which “we grow, adjust, and expand the paradigm of what is known and which helps propel science forward.”

The scholarly publishing community needs to focus more attention on the new annotation services and models being developed, since it is the scholars who are the most likely users of — and the ones likely to obtain the most value from — digital annotation services. In some ways, this is part of the functionality that Mendeley provides and one element that makes the service valuable for its members — and valuable to Elsevier, who recently acquired Mendeley. Quality services along these lines — which replicate the traditional tools built up around working with print on paper — are something users are longing for in this new digital environment. Such functionality may finally be on the horizon for the growing community of readers of digital text.

Read More