Back links: CriES Workshop - CriES Pilot Challenge

Preprocessing Tool for CriES

Description

The preprocessing tool preforms the following steps on the Yahoo! Answers dataset:

The output of the tool will be:

  1. A subset of the dataset in XML
  2. A user graph modeling questioner-answerer relation in GraphML format
  3. A topic file consisting of 60 multi-lingual topics int TREC format

Download

The preprocessing tool is written in Java and can be downloaded as executable jar file: cries_preprocessing.jar. If you are interested on the source code, please contact Philipp Sorg.

Documentation

The preprocessing tool is implemented in Java. You will need a Java 1.6 runtime environment to run the program.

Command to run the tool:
java -jar cries_preprocessing.jar -Dxml_file=<Yahoo! Answers XML file> -Doutput_dir=<output directory>

Comments:
- The preprocessing tool can handle gzipped XML input
  files (in this case the file FullOct2007.xml.gz)

The following output files will be generated:

Please refer to our Evaluation Guidelines for instructions of how to submit your expert search results.

cries_automatic_eval.trec_rel.txt is a TREC style relevance file, that assigns each topic exactly one relevant user, namely the user who wrote the best answer to topic question. This file can be used for testing/debugging, but it will most probably heavily underestimate the values of evaluation measures.