Posts tagged "HathiTrust"
from Library Tech Talk

Problems with Authority

A graph of organization nodes and edges depicting the United States Federal bureaucracy.

MARC Authority records can be used to create a map of the Federal Government that will help with collection development and analysis. Unfortunately, MARC is not designed for this purpose, so we have to find ways to work around the MARC format's limitations.

The Once and Future Text Encoding Model

Map image, corresponding XML, and web search results

Lately I’ve been looking back through the past of the Digital Library Production Service (DLPS) -- in fact, all the way back to the time before DLPS, when we were the Humanities Text Initiative -- to see what, if anything, we’ve learned that will help us as we move forward into a world of Hydra, ArchivesSpace, and collaborative development of repository and digital resource creation tools.

Beyond Google Books: Getting Locally-Digitized Material into HathiTrust

Page from early printed edition of Dante's Divine Comedy with an elaborate border and capital N.

HathiTrust started out with only content digitized by Google, but a goal from early on was to support digitized book material from a variety of sources. One early effort provided a toolkit to partners for preparing content, but which turned out to require more technical effort than was reasonable. We rethought our approach and simplified the requirements for partners while maintaining the same high quality standards for HathiTrust.

Quality in HathiTrust (Re-Posting)

Skew in a Google-digitized volume in HathiTrust

This is a re-posting of a HathiTrust blog post. HathiTrust receives well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something they can do about it. A new blog post is intended to shed some light on the thinking and practices about quality in HathiTrust.

Practical Relevance Ranking for 11 Million Books, Part 3: Document Length Normalization

Relevance is a complex concept which reflects aspects of a query, a document, and the user as well as contextual factors. Relevance involves many factors such as the user's preferences, task, stage in their information-seeking, domain knowledge, intent, and the context of a particular search. This post is the third in a series by Tom Burton-West, one of the HathiTrust developers, who has been working on practical relevance ranking for all the volumes in HathiTrust for a number of years.

Pager

Page 1 of 5