You must enable JavaScript to use this site
  • Math IR Happening at MIR 2012

    The MIR 2012 workshop will feature a friendly competition for the systems presented at the workshop. Since math information retrieval is still quite young and developing, we will not make this an official competition, but a happening, where we get together and test our system on a common set of problems. We expect the happening to transcend the workshop proper.

    The aim of the MIR happening is to jointly gain a better understanding into the information retrieval needs of mathematicians and the respective strengths and weaknesses of the respective IR approaches and systems. As a tangible result of the happening the organizers will compile a survey paper and report of this newly-gained understanding.

    In particular, it is not an aim of the MIR happening to determine "winners" of the competition in any form. That may be an aim of a subsequent competition, when we have a better grip on the problems and possible evaluation approaches.

    MIR Challenges

    We plan to conduct the happening via three challenges:

    1. Formula Search (Automated) in the categories:
      1. similarity search for formulae
      2. instance search (query formulae with query variables)
      The judges select/prepare a formula database and a set of formula queries. The formula database contains a list of formulae with identifiers. Every formula in two encodings: LaTeX and MathML (parallel-markup presentation/content). The query formulae are in the same format (extended by query variables). Participating IR systems obtain the formula database and the list of formula queries and return for every query an ordered list of hits (identifiers of formulae claimed to match the query), plus possible supporting evidence (e.g. a substitution for instance queries). Results will be judged on precision, recall, results ordering, and search time.
    2. Full-Text Search (Automated) This is like formula search above, only that that we use a document collection (LaTeX and XHTML+MathML(parallel)) and a set of text/formula queries (in the same formats) instead of pure formulae. IR results are ordered lists of documents (i.e. [XPointer] references into the documents with a highlighted result snippets as supporting evidence) and will be judged on precision, recall, results ordering, search time, and presentation of the documents.
    3. Open Information Retrieval (Semi-Automated) In contrast to the first two challenges, where the systems are run in batch-mode (i.e. without human intervention), in this one mathematicians will challenge the (human) contestants to find specific information in a document corpus via human-readable definite descriptions (natural language text), which are translated by the contestants to their IR systems. Results to be delivered are hits in free form together with a description of how the results were found.

    MIR Judges Panel

    We have invited a panel of mathematicians participating in CICM as a panel of judges who will select/prepare the MIR challenges, judge the solutions of the contestants, and provide overall feedback.

    FAQ: open issues to be discussed

    1. Q: Do we also judge indexing time? A: No, at least not as main criteria.
    2. Q: Do we give the formula databases or document corpora to the contestants ahead of time? A: Yes, formula database and document corpora will be released on July 5th, in a format of previously announced sets.

    These questions and partial answers, which we adopt for the MIR happening are here.

Last modified: October 08 2012 17:33:32 CEST