Page Numbers and URLs in MBooks

We get questions from MBooks users (most recently from dfulmer in the comments to this post) about how to link to pages, what the URL parameters such as "num" and "seq" mean, and other questions about links and page numbers.

There are a couple of issues. The first is about URLs. The most stable and persistent URL is the one that we include in the Mirlyn record, and also at the top of the pageturner with other descriptive metadata. It's called a "handle" and is a robust persistent identifier managed by CNRI (more on handles at http://www.handle.net/). They look like this:

http://hdl.handle.net/2027/mdp.39015021038404

and this is the URL that we encourage people to use and save. However, since they all start with http://hdl.handle.net/2027, people don't recognize them as belonging to the University of Michigan. Users are much more familiar with URLs that include the umich.edu domain. Nevertheless, since these handles are persistent and robust ("2027" is registered with CNRI as belonging to us) these are the URLs that should be used.

Other URLs will be less stable. The sharper-eyed among our readers will have noted that our URLs recently changed from starting with "mdp.lib.umich.edu" to "sdr.lib.umich.edu". We will redirect users any time they use a URL starting with "mdp.lib.umich.edu" but these local domain names will change over time. The same is true for the URL parameters such as "page," "num," "seq," "orient," etc. Phil Farber's response to the same post noted above provides documentation on what these mean, but be aware that these will change without warning. URL hacking will lead to tears before bedtime. The other related issue has to do with page numbers and other metadata. People will notice that many MBooks include a table of contents with page numbers on the left-hand side, such as this one. You may also notice that some books lack this table of contents, and use "sequence" instead of page numbers. Here's an example of a book for which we do not have page numbers.

It all has to do with the metadata. At a minimum, we know the sequence in which the pages of any given book should be displayed. The pageturner buttons for forward and backward use this information to work properly, but for some books, this is all the information we have. Since the sequence of pages starts with the front cover, it's unlikely that the sequence number will match the actual page number. (And as Suzanne noted in her comments to this post, if someone has a better term than "sequence" please let us know!) Many of these books without page numbers were early efforts by Google; they are sending us newer, better versions of these books, so eventually the entire collection will include page numbers.

In many (soon most or all) cases we will have page numbers, along with additional metadata identifying title pages, tables of contents, first pages of sections, and other page features. We get these metadata from Google. We don't know how Google generates them, but it's undoubtedly an automated method. This means that they won't be perfect. When we do have metadata indicating the title page, we will open the book to the title page as a default. If we don't have any metadata about the title page, we will open to the first image (usually the front cover).

Page numbers are, to quote the kids, whack. In some books, they are out of sequence, or repeated, or misnumbered, or missing. With many journals, the library has bound together two or more issues, each with its own pagination from 1 to whatever. Therefore, the online volume could have multiple pages numbered 207, as in the example that David points to in his comments to the post mentioned above. Right now, MBooks will take you to the first instance of p. 207 if you type that into the "goto" box. We could probably do something to alert people to the fact that there are multiple pages numbered 207, and give them links to each of them.

We need to consider having persistent URLs to individual pages. People want to refer to individual pages, and we should have a method with a stable URL to allow them to do it. We could also do more to have a predictive method of referring to a page. Ed Vielmetti recently wrote some ideas about this in his blog.

We will look at this more carefully soon, once we get through the current round of development for collection builder and other new features.

Tags: