Full-Text MBook Searches from the Library Catalog

At the University of Michigan Library, in partnership with Google, we have been busily scanning our collections. This opens up lots of possibilities, including an exciting one that launches today: search the full text of a book from within Mirlyn, the library's catalog.

If a book has been scanned by Google, there is a "search in in this book" field within the library catalog record. Depending on the particular book, a search will result in full text results (if the book is in the public domain) or search-term only view (if the book is in copyright).

Here is an example of an out-of-copyright book (with full-text results available): 1931: A Glance at the Twentieth Century. The record in the catalog looks like this:

Screen shot of Mirlyn record with 'search in this book' option

And here are the results of that search:

Screen shot of Mirlyn record with 'search in this book' option

All books that have been scanned -- one million and counting -- are searchable. Search results are linked to the full text for those works that are in the public domain. Search results for books that are still under copyright are shown in brief view. Brief view displays a phrase or two on either side of the search term, but doesn't include full-text display of the page. In either case, the search in the book tool will help you know if you want to get the actual book off the shelf before you visit the library or make a delivery request.

Try these sample records:

Full-text: The Miscellaneous Writings of Lord Macaulay

Search only: 500 Bracelets: An Inspiring Collection of Extraordinary Designs


Matthew Schuhwerk Hampel
on May 31, 6:23pm

That's nice, but "Sequence" means nothing outside the spiraling towers of the Michigan libraries. And perhaps Google. Is a sequence a page? Two pages? A folio? A sentence? A placeholder?

Suzanne E Chapman
on June 1, 8:27am

Sequence is indeed a page - but numbering starts with the cover. We originally didn't receive actual page number data (sequence 12 = page 7) so we had to use this. We've since begun getting actual page number data and when we do have that information, the interface uses "page" instead of the "sequence" number. Eventually we'll have actual page data for all our items. We debated the term but couldn't come up with anything better. I'd welcome suggestions!

David Michael Fulmer
on June 3, 11:02pm

Matt, "Sequence" doesn't even mean anything inside the spiraling towers of the Michigan libraries! Maybe this would be a good place to explain the anatomy of an MDP url. It looks to me like the interface uses "num" instead of "page" and "page" always equals "root" (when it doesn't equal "search"). Also, what does "u=1" mean? The num doesn't seem to work without it. You might also explain all the other parameters and defaults (view, size, seq, num, page, u, q1, start, etc.) and whether they are required or not. What sort of behavior might we expect when trying to link to page 208 in a book with two volumes bound together and hence two pages numbered 208, like this one: http://sdr.lib.umich.edu/cgi/m/mdp/pt?view=image;size=100;id=mdp.39015000547821;page=root;u=1;num=208 Is there a parameter that will always get me to the table of contents?

Perry Willett
on June 4, 9:23am

David, There is a set of complex issues here, with page numbers, URLs and metadata, probably worthy of its own blogpost. We'll work on addressing your questions. The URL you include isn't valid--I think you mean this: http://hdl.handle.net/2027/mdp.39015000547821 Thanks, Perry

Phillip R Farber
on June 4, 10:17am

Following on Perry's post, I'll try to explain parameter semantics in more detail. I'd note, however, that pageturner URL parameters are not intended to provide an API to the data. They have meaning mainly within the context of the pageturner application as experienced by the user. So: id - the item identifier page - the web page to display, which would be one of 'root' (the view of the item) or 'search' (the search results page) seq - the sequential number of the scanned page starting at 1 (usually the front cover) num - a page number as printed on the page. This could be 2 or 7, or xxi and so on if the item has page number metadata available. Not all items have page number metadata yet. view - one of 'image' (a page image), 'pdf' (a page image rendered as a pdf), 'text' (the OCR of the page). size - a percentage of a nominal 680 pixel width scaling of the full resolution tiff or the size of the result list slice. q1 - the query string entered by the user when searching start - the beginning offset into the list of search results u=1 - indicates this page or seq value was entered by the user. It it part of an algorithm that allows us to handle the problem that sometimes the number must be treated as a sequence number and sometimes as a page number. There is no url parameter that indicates the table of contents page. If the item has page feature metadata, such as Table of Contents or Title Page, links to the corresponding sequence numbered page image will appear in the left-hand side bar. For a volume that has repeating page numbers, assuming that page number metadata is available, entering a given page number will take you to the first page so numbered. If page number metadata is not available, the number entered is treated as a sequence number which is always unique. Phil

David Michael Fulmer
on June 5, 1:43am

Thanks for that information! As for my url, it isn't all showing up but if you triple-click on the line you can copy it.

on June 16, 1:26pm

This makes me think about making links to mbooks from MY catalog, for titles we also hold. How would you all feel about that? I'm trying to think of ways to do that without putting unreasonable load on your servers. It would be great if you wanted to contact me in email to discuss this further (or a phone conversation?).

Perry Willett
on June 16, 3:42pm

Hi Jonathan, As I mentioned on the next gen catalog listserv a few weeks ago, we have OAI records for all the freely available titles. The OAI records have OCLC numbers in them. I'd be happy to discuss other strategies. Thanks, Perry pwillett@umich.edu

Add new comment

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.