====== Evaluation Guideline ====== This are the evaluation guidelines for the [[cries:challenge|CriES Pilot Challenge]]. ===== Submission of Results ===== We expect the retrieval results in TREC result file format: 0000022426 Q0 0000152271 0 0.84 Exp1 Please regard the following formatting rules: * corresponds to the -tag in the Yahoo! Answers XML file * corresponds to the user identifier in the Yahoo! Answers XML file without leading "u" * and must have exactly 10 digits (with leading zeros) * is ignored, but must not be empty These rules ensure that your results are compatible to the [[http://trec.nist.gov/trec_eval/|trec_eval Tool]] we use for the evaluation and to our relevance assessments. Each participant is allowed to submit result files of 3 different runs in order of priority. We guaranty to evaluate the first run. Depending on the number of participants we might be able to also evaluate 2 or all 3 of the runs. For all runs we will evaluate the first 10 retrieved experts for each question. Please submit your result files via email to [[sorg@kit.edu|Philipp Sorg]]. ===== Relevance Assessment ===== The relevance assessment is based on written answers of experts in the dataset. Given a question, the assessors compare it to past answers of users. It is thereby assumed that experts have no further knowledge than expressed in their answers. The assessors have the following choices to classify experts according to questions: - Expert is likely able to answer. - Expert may be able to answer. - Expert is probably not able to answer. This will lead to two different relevance assessment files. First only users in class 1 will be defined as relevant for questions. Second both users in class 1 and 2 are used as relevant users. Given by the task in the pilot challenge, some relevances are given apriori: * The authors of the best answers of a question is defined as relevant in class 1. * Questioners are defined as non relevant (class 3) for their own questions. ===== Automatic Evaluation ===== {{:cries:cries_automatic_eval.trec_rel.txt|}} is a TREC style relevance file, that assigns each topic exactly one relevant user, namely the user who wrote the best answer to topic question. This file can be used for testing/debugging, but it will most probably heavily underestimate the values of evaluation measures.