Database Names are Hard to Learn

Word cloud showing frequency of incorrect spellings of database names

The search box on the University of Michigan Library’s website is many user’s front door to library resources. Despite the presence of links directly to commonly-used databases on the main library page, users still search for databases by name. From January 1, 2014, through April 2, 2014, library users conducted 261,962 searches, of which 7.4% (19,267) were for the most-used 7 databases. Nearly 15% of searches for these 7 databases were for incorrect variations of the actual database name.

In descending frequency of incorrect queries, the top 7 databases were:

LexisNexis: 714 queries total / 514 misspellings (71.99%)

The variants used 3% or more of the time:

  • lexis nexis

  • lexisnexis (correct)

  • lexis

  • lexis nexus

  • lexus nexus

Interestingly, of the recognizable queries for this database, the correct spelling -- LexisNexis -- was the second most commonly entered. Almost 40% of recognizable queries were for “lexis nexis”; 28% were for the correct version.

PsycInfo: 3332 queries total / 1588 misspellings (47.66%)

There were 30 variants; the ones used 3% or more of the time were:

  • psycinfo (correct)

  • psychinfo

  • psych info

  • psyc info

WorldCat: 1376 queries total / 13.59%)

There were 14 variants; the ones used 3% or more of the time were:

  • worldcat (correct)

  • world cat

PubMed: 5792 queries total / 381 misspellings (6.58%)

There were 20 variants; the ones used 3% or more of the time were:

  • pubmed (correct)

  • pub med

JSTOR: 3196 queries total / 128 incorrect (4.01%)

There were 27 variants, but almost all were used only once. The correct one, JSTOR, was used almost 96% of the time.

ProQuest: 2123 queries total / 64 incorrect (3.01%)

There were 18 variants; but almost all were simple typos, used only once. The correct one, ProQuest, was used 97% of the time.

Google Scholar: 2734 queries total / 66 incorrect (2.41%)

There were 26 variants, but almost all were simple typos. The correct one, “Google Scholar”, was used almost 98% of the time.

Please see the full table of results for all the details.

Results

As a result of this study, library subject specialists added the most commonly used non-standard variants for database names to the indexing of the data. Prior to this study, database searches using the wrong name would result in poor (if any) database results. Now, searches for these variants retrieve the correct database.

Method

The University Library maintains and anonymized log of search queries entered to the search tabs on the site. These queries are stored in a MySQL database. To do this research, I simply did queries against the database using wildcards, and then filtered out obvious false matches. For example, for WorldCat, I did a case-insensitive search for “w*t”, and removed all results that matched on “wildcat”, “what”, etc. Similarly, for “Google Scholar, I looked for “Go*r”, and removed terms that were clearly not attempts to reach that resource.

4 Comments

Ian Demsky
on Aug. 20, 10:31am

Perhaps the prevalence of Google's "did you mean?" functionality and ability to pull up relevant results even with terrible spelling has translated over to library search behaviors. Also -- especially if we provide helpful services like matching on incorrect spellings -- it's probably quicker and easier to search on a slightly-off name than to browse for a correct name.

Christina
on Sept. 10, 3:41pm

Fascinating! Thanks for the research.

Rachel Vacek
on Sept. 26, 7:36am

This is cool, Ken. Are you able to add these misspellings to your discovery platform to help users?

Ken Varnum
on Sept. 26, 9:00am

Rachel -- yes, we added common misspellings to our database finder. So if you search in the default search box on www.lib.umich.edu for the most commonly misspelled versions, you'll still get the right database.

Add new comment

By submitting this form, you accept the Mollom privacy policy.