Not available, or not found? Lessons from user queries in the Oria catalog at UiO
Since January 2015, academic libraries in Norway are making use of Oria, a catalog based on the ExLibris Primo software. Within Oria, various user actions, such as searches and clicks on search filters are captured over time, and statistical data can be accessed via Primo Analytics. This wealth of data provides new opportunities to analyze user interaction with current library catalogs, and to potentially improve the access to library collections.
In an exploratory study within the Visual Navigation Project, we have analyzed the log data available of the library catalog of the University of Oslo. In particular, we looked at two datasets: the most popular queries (between January 2015 and September 2016), and the “zero-result” queries, i.e. the queries for which searches did not get any results (between August 2015 and September 2016).
Analysis process: we first normalized all queries (e.g. “stanislav andreski” and stanislav andreski were counted as one query). Then, we manually annotated the 50 most popular queries, and a random selection of 50 "zero-result" queries. If derivable from the query, we noted the nature of the query (e.g. for a title, topic or database), the resource type sought for (e.g. book or journal), whether the query was pensum-related (i.e. to the reading list for a UiO course), and whether it was a success or not. We defined a successful query as a query for which the first result page included the resource sought for.
"What do searchers look for?" - Analyzing the most popular queries
We started our analysis by looking at the fifty most popular queries in 2015 and 2016. Those are the following: atekst (issued 1.425 times), pubmed (1.221), exphil (719), det kvalitative forskningsintervju (711) and nature (669). Here, we already observe a wide diversity of requested source: e.g. databases (atekst and pubmed), pensum books (det kvalitative forskningsintervju), and a journal (nature). This diversity led us to dive further into the exact nature of the fifty most popular queries in Oria.
In terms of query types, queries for titles are most common (49%, e.g. det kvalitative forskningsintervju, det norske samfunn). This is followed by general topics (26%, e.g. spesialpedagogikk, neurology), as well as database names (such as atekst, pubmed and duo, in 9% of the cases).
The most common target resource type consist of books (45%), followed by journals (11%) and databases (10%). Due to the brevity of some of the queries, in 34% of the cases it was not possible to determine what the target resource type was.
At UiO, details for all university courses, including reading lists, can be found via the "Studies" pages. Actually, the majority of popular queries in Oria (57%) is related to students’ reading lists (the "pensum"), indicating that the library catalog is important as a place to actually acquire reading material.
Finally, we analyzed to what extent the popular queries were successful. This proved to be often: in 51% of the cases the right result appears in the top 10 of results. However, for almost a fifth (19%) of the most popular queries, the intended result do not appear on the first result page. This includes some important sources, such as pubmed, nature, or science. (Again, due to the brevity of some queries it was not possible to determine if the query was a success in 30% of the cases).
Summarizing, the findings for popular queries dataset show that some frequently issued queries do not result in direct access to the right resource. To investigate this further, we then looked at the set of queries which are not retrieving any results.
"Why do queries fail?" - Investigating 'zero result' queries
The most common type of 'failed queries' (43%) consists of citations (e.g. Browning, N. (2015). The ethics of two-way symmetry and the dilemmas of dialogic kantianism. Journal of Media Ethics). In many cases, these queries appear to be directly copied from reading lists. Other frequent searches included titles ("Sentralbankens oppgaver i dag og i fremtiden") and authors (e.g. Christopher Hotchens).
In terms of "unfound" resource types, we observe that these were most commonly books (26%) and book chapters (12%), followed by research articles (26%).
Again, looking at the degree of pensum queries, we were able to determine that at least 28% of the failed queries were likely for resources related to the pensum, for instance the misspelled query basic immubology, most likely aimed at retrieving a book with this exact title.
Finally, we delved a bit in why no results were returned. In part, this is because of the query being too specific, in most cases citations that were directly and fully pasted into the query box (22%). Second, misspellings and reference mistakes (e.g. the wrong year) occur in 20% of the cases. For instance, the query why students underacheive does not receive any results. Finally, for 24% of all examined queries, the resource is really not available in Oria, for instance Haugianerliberalistene: En analyse av haugianere som politikere og næringslivsaktører. This reveals the worrying issue that the target resource of 52% of failed queries actually is available in Oria, but not found by the searcher.
Summarizing, we see some similarities between the popular and zero result queries, for instance the prominence of queries for books and book chapter. However, we also observed that many of the failed queries consisted of pasted citations and misspelled queries, and that many of the intended resources are actually available in the library. Next, we look at which lessons we draw from this analysis.
"How can we help?" - Learned lessons
First of all, spell checking and query corrections should be enhanced to help users to find more useful results. To take a practical example, the (book) query why students underacheive returns 0 results, while why students underachieve gives 463 results. Also, the analysis of zero result queries showed that almost a fifth were caused by misspellings. Hence, it would be important to reduce the susceptibility to errors, since spelling mistakes occur easily (e.g. when querying in a non-native language, or in case of dyslexia). Moreover, autocomplete suggestions (as found in common search engines) would be useful as an aid to users when they are formulating their queries.
A second learned lesson is that pensum queries are very common in the UiO's library catalog, but they frequently fail to get immediate results. This indicates a further need for supporting these types of queries, and potentially to aid or further educate users in performing them. A student may have to be aware that (s)he needs more "patience" to find materials in Oria, as compared to regular search engines. Since students at UiO are used to utilizing course codes to access course-related content, an additional possibility could be to support entering course codes (e.g. INF2200) for retrieving pensum books. There would be different ways to achieve this, for instance by cataloging course codes, but these solutions also involve some trade-offs (e.g. more maintenance needs).
Third, common databases are only showing up high in the results list in some cases (e.g. atekst), while in other cases, they do not directly appear (e.g. pubmed). This may potentially inhibit and confuse (novice) library users. Hence, there is a need for separate catalog entries for these queries, since they represent a large degree of queries.
In future work, we will further analyze the usage logs of Oria, and disseminate our work via a short research paper. Further research venues will look at the general usage statistics of Oria (e.g. the use of the systems, the advanced search features and so forth).
Moreover, we are currently analyzing the broad, exploratory queries in Oria, which can be problematic. For instance, if a user is searching for “economics”, she is confronted with 2.4 million results, and a plethora of facets. In our project, we are exploring how the library's subject vocabularies (realfagstermer, humord, etc.), may aid in supporting these types of exploratory queries, by providing users with specific terminology (e.g. important subfields) and context (e.g. linked definitions from the store norske leksikon).