Download ConvQuestions

Training Set (6720 Conversations) Dev Set (2240 Conversations) Test Set (2240 Conversations) The ConvQuestions benchmark is licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons License

What do conversations in ConvQuestions look like?

Books Movies Soccer Music TV series
When was the first book of the book series The Dwarves published? Who played the joker in The Dark Knight? Which European team did Diego Costa represent in the year 2018? Led Zeppelin had how many band members? Who is the actor of James Gordon in Gotham?
2003 Heath Ledger Atletico Madrid 4 Ben McKenzie
What is the name of the second book? When did he die? Did they win the Super Cup the previous year? Which was released first: Houses of the Holy or Physical Graffiti? What about Bullock?
The War of the Dwarves 22 January 2008 No Houses of the Holy Donal Logue
Who is the author? Batman actor? Which club was the winner? Is the rain song and immigrant song there? Creator?
Markus Heitz Christian Bale Real Madrid C.F. No Bruno Heller
In which city was he born? Director? Which English club did Costa play for before returning to Atletico Madrid? Who wrote those songs? Married to in 2017?
Homburg Christopher Nolan Chelsea F.C. Jimmy Page Miranda Cowley
When was he born? Sequel name? Which stadium is this club's home ground? Name of his previous band? Wedding date first wife?
10 October 1971 The Dark Knight Rises Stamford Bridge The Yardbirds 19 June 1993
ConvQuestions was created using a set of seed conversations that contained two paraphrases for each question.

How was ConvQuestions created?

ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. They are compiled from the inputs of 70 Master crowdworkers on Amazon Mechanical Turk, with conversations from five domains: Books, Movies, Soccer, Music, and TV Series. The questions feature a variety of complex question phenomena like comparisons, aggregations, compositionality, and temporal reasoning. Answers are grounded in Wikidata entities to enable fair comparison across diverse methods. The data gathering setup was kept as natural as possible, with the annotators selecting entities of their choice from each of the five domains, and formulating the entire conversation in one session. All questions in a conversation are from the same Turker, who also provided gold answers to the questions. For suitability to knowledge graphs, questions were constrained to be objective or factoid in nature, but no other restrictive guidelines were set. A notable property of ConvQuestions is that several questions are not answerable by Wikidata alone (as of September 2019), but the required facts can, for example, be found in the open Web or in Wikipedia. For details, please refer to our CIKM 2019 full paper.

CONVEX: A Baseline Method

We also provide CONVEX, an unsupervised method that can answer incomplete questions over a knowledge graph (Wikidata in our case) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. Again, please refer to the paper for the details.
GitHub link to CONVEX code


For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mpi HYPHEN inf DOT mpg DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit

ConvQuestions Leaderboard

Model MRR ⇩ P@1
Kaiser et al. '21
0.279 0.240
Marion et al. '21
0.260 0.250
Focal Entity Model
Lan et al. '21
0.248 0.248
Christmann et al. '19
0.200 0.184
Marion et al. '21
0.175 0.166
Star Model
0.175 0.175
Chain Model 0.075 0.075
Guo et al. '18
0.061 0.061

* This variant assumes that the gold seed entity for each conversation is given.

One will notice a difference between the results for CONVEX reported in the original CIKM 2019 paper, and those shown on this leaderboard. CONVEX is an unsupervised method that does not require training data: to highlight this point, in the paper, 20% of ConvQuestions was used for tuning the hyperparameters, and the remaining 80% questions for evaluation. On the leaderboard, results are for the 60:20:20 train-dev-test split provided on this website. This standard split was created for a fair comparison with supervised methods on ConvQuestions, most of which now involve a neural component that benefits from a large training set.


"Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion", Philipp Christmann, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum, in Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 (CIKM '19), Beijing, China, 3 - 7 November 2019, pages 729-738.
[Preprint] [Code] [Slides] [Poster] [BibTeX]


Please click on the buttons below to load sample conversations:

If you want to run CONVEX on your own conversational questions' dataset over Wikidata, please send it to us by email (without gold answers if you wish). We will get back to you at the earliest with the outputs.