Evaluation Guideline

This are the evaluation guidelines for the CriES Pilot Challenge.

Submission of Results

We expect the retrieval results in TREC result file format:

<question id> <iteration> <expert id> <rank> <sim> <run id>
0000022426    Q0          0000152271  0      0.84  Exp1

Please regard the following formatting rules:

These rules ensure that your results are compatible to the trec_eval Tool we use for the evaluation and to our relevance assessments.

Each participant is allowed to submit result files of 3 different runs in order of priority. We guaranty to evaluate the first run. Depending on the number of participants we might be able to also evaluate 2 or all 3 of the runs.

For all runs we will evaluate the first 10 retrieved experts for each question.

Please submit your result files via email to Philipp Sorg.

Relevance Assessment

The relevance assessment is based on written answers of experts in the dataset. Given a question, the assessors compare it to past answers of users. It is thereby assumed that experts have no further knowledge than expressed in their answers.

The assessors have the following choices to classify experts according to questions:

  1. Expert is likely able to answer.
  2. Expert may be able to answer.
  3. Expert is probably not able to answer.

This will lead to two different relevance assessment files. First only users in class 1 will be defined as relevant for questions. Second both users in class 1 and 2 are used as relevant users.

Given by the task in the pilot challenge, some relevances are given apriori:

Automatic Evaluation

cries_automatic_eval.trec_rel.txt is a TREC style relevance file, that assigns each topic exactly one relevant user, namely the user who wrote the best answer to topic question. This file can be used for testing/debugging, but it will most probably heavily underestimate the values of evaluation measures.