East Asian Language Searching in HathiTrust

HathiTrust is a large-scale digital library developed by the University of Michigan Library together with several other major libraries.  It includes the U-M books scanned as part of the Google Books Project as well as other materials digitized here and elsewhere.  This vast digital collection includes a significant number of books in Chinese, Japanese, and Korean, but searching the full text of East Asian books requires a few caveats. 

First, a note of caution:  the system by which the HathiTrust's computers recognize words within the collected books works more smoothly with European languages than with East Asian characters.  Although the accuracy of search results vary wildly among various texts, searching the full texts of the books within the HathiTrust corpus can still have some utility.  Second, some books within the HathiTrust are available in full text and others are not.  In cases where the full text is not accessible, HathiTrust will display a list of pages on which search terms appear. 

When searching in Chinese and Japanese, separate individual search terms with quotation marks and a space.  For example, when searching for books that include terms for "feudal" and "system," use ‭"‬封建" "制度‭"‬ in the full-text search bar.  If, however, you want to search for books in which "feudal system" appears as a complete phrase, input ‭"‬封建制度‭".  This will return a much smaller number of results, providing instances in which all four characters are connected in the text.

When searching in Korean, the quotation marks are unnecessary.  Searching for "literature" and "research" separately, you can simply enter 문학 연구 into the HathiTrust search bar.  To search for "literary research" as a phrase, type 문학연구 without spaces.

Please direct any questions about using HathiTrust in Chinese, Japanese, or Korean to Dawn Lawson in the Asia Library.

Page maintained by Yunah Sung
Last modified: 11/16/2015