Crossing the Boundaries of Domains and Languages.

Evaluation Guideline

This are the evaluation guidelines for the CriES Pilot Challenge.

Submission of Results

We expect the retrieval results in TREC result file format:

<question id> <iteration> <expert id> <rank> <sim> <run id>
0000022426    Q0          0000152271  0      0.84  Exp1

Please regard the following formatting rules:

  • <question_id> corresponds to the <uri>-tag in the Yahoo! Answers XML file
  • <expert_id> corresponds to the user identifier in the Yahoo! Answers XML file without leading ā€œuā€
  • <question id> and <expert id> must have exactly 10 digits (with leading zeros)
  • <iteration> is ignored, but must not be empty

These rules ensure that your results are compatible to the trec_eval Tool we use for the evaluation and to our relevance assessments.

Each participant is allowed to submit result files of 3 different runs in order of priority. We guaranty to evaluate the first run. Depending on the number of participants we might be able to also evaluate 2 or all 3 of the runs.

For all runs we will evaluate the first 10 retrieved experts for each question.

Please submit your result files via email to Philipp Sorg.

Relevance Assessment

The relevance assessment is based on written answers of experts in the dataset. Given a question, the assessors compare it to past answers of users. It is thereby assumed that experts have no further knowledge than expressed in their answers.

The assessors have the following choices to classify experts according to questions:

  1. Expert is likely able to answer.
  2. Expert may be able to answer.
  3. Expert is probably not able to answer.

This will lead to two different relevance assessment files. First only users in class 1 will be defined as relevant for questions. Second both users in class 1 and 2 are used as relevant users.

Given by the task in the pilot challenge, some relevances are given apriori:

  • The authors of the best answers of a question is defined as relevant in class 1.
  • Questioners are defined as non relevant (class 3) for their own questions.

Automatic Evaluation

cries_automatic_eval.trec_rel.txt is a TREC style relevance file, that assigns each topic exactly one relevant user, namely the user who wrote the best answer to topic question. This file can be used for testing/debugging, but it will most probably heavily underestimate the values of evaluation measures.

cries/evaluation_guideline.txt · Last modified: 2010/05/10 13:47 by pso
© 2008 Institute AIFB, University of Karlsruhe & ISWeb, University of Koblenz.
All rights reserved.
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0