LogCLEF 2010

LogCLEF deals with the analysis of queries as expression of user behavior.
Goal is the analysis and classification of queries in order to improve search systems.

LogCLEF 2009 · LogCLEF 2010

LogCLEF will be a workshop lab at CLEF 2010.

Topic and Goal

Log data constitute a relevant aspect in the evaluation process of the quality of a search engine and the quality of a multilingual search service; log data can be used to study the usage of a search engine, and to better adapt it to the objectives the users were expecting to reach.

The goal of LogCLEF is the analysis and classification of queries in order to understand search behavior in multilingual contexts and ultimately to improve search systems.

Data Collection for LogCLEF 2010

The data collection consists of two large logfiles from information providers:

* The European Library (TEL) logs
As in 2009, a large log of activities from The European Library are provided. This service provides access to several national libraries of Europe. Users and content come from many languages.
Click on the following links to download a sample of the logfiles and a description of the files.

* Deutscher Bildungsserver (DBS) logs
The "Deutscher Bildungsserver" is a quality controlled internet directory for educational resources. A raw server log representing three months of activities on the portal is made available. The size of the files is 5 GB.
Click on the following links to download a sample of the logfiles, a description of the files and a file about the DBS structure.

Task Definition

The main question behind the task definition comes from search service providers who wonder how they can improve their services. Ultimately, researchers need to better understand user behavior in order to reach that high level goal.

Two objectives of the analysis of the logs are proposed, one for each set of logs:

Deutscher Bildungsserver (DBS) logs

The objective of the analysis of the DBS logs is the exploration of the relation between query and viewed content. The analysis can explore formal issues of the query and content as well as the distribution of words within both.

Potential analysis
  1. Are query terms related to the content viewed and/or paths taken within the system?
  2. Can query modifications be explained by the content viewed?
  3. Develop metrics to identify successful searches
(please respect the standard Politeness policies and do not overburden the DBS server in case you crawl their content)

The European Library (TEL) logs

Investigate the issue of query languages with respect to successful search. A successful search is defined as one of the following action listed in the right hand box when an item of the result clicked is listed. + Services: Availability at the library, Link to other services (Amazon, etc), collection homepage + Options: Save in favorites, Send by email

Potential analysis
  1. language identification for the queries
  2. initial language vs country IP address
  3. subsequent languages used on same search
  4. country of the library vs language of the query vs language of the interface
Possible tasks
  1. Explore the relationship between language of the query, origin of the user, selected interface language and language of library viewed
  2. Are different languges used within subsequent searches?
  3. What kind of language changes occur?


January - May 2010 Registration
May 2010 Data Release
June 8th Final Task Description Release
July, 20 Submission of Results
August, 1st Submission of Notebook Paper
August, 10 Submission of Revised Notebook paper and extended Abstract
Sept, 22-23. Workshop in Padua


Thomas Mandl (mandl at uni-hildesheim.de), University of Hildesheim, Germany
Giorgio Maria Di Nunzio (dinunzio at dei.unipd.it), University of Padua, Italy

Steering Committee

Maristella Agosti, University of Padua
Jim Jansen, University of Pennsylvania, USA
Jaap Kamps, University of Amsterdam, The Netherlands
Vivien Petras, Humboldt University Berlin, Germany
Johannes Leveling, Dublin City University, Ireland
Inderjeet Mani, MITRE Corp., USA