Download ConvQuestions

Training Set (6720 Conversations) Dev Set (2240 Conversations) Test Set (2240 Conversations) The ConvQuestions benchmark is licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons License

What do conversations in ConvQuestions look like?

Books Movies Soccer Music TV series
When was the first book of the book series The Dwarves published? Who played the joker in The Dark Knight? Which European team did Diego Costa represent in the year 2018? Led Zeppelin had how many band members? Who is the actor of James Gordon in Gotham?
2003 Heath Ledger Atletico Madrid 4 Ben McKenzie
What is the name of the second book? When did he die? Did they win the Super Cup the previous year? Which was released first: Houses of the Holy or Physical Graffiti? What about Bullock?
The War of the Dwarves 22 January 2008 No Houses of the Holy Donal Logue
Who is the author? Batman actor? Which club was the winner? Is the rain song and immigrant song there? Creator?
Markus Heitz Christian Bale Real Madrid C.F. No Bruno Heller
In which city was he born? Director? Which English club did Costa play for before returning to Atletico Madrid? Who wrote those songs? Married to in 2017?
Homburg Christopher Nolan Chelsea F.C. Jimmy Page Miranda Cowley
When was he born? Sequel name? Which stadium is this club's home ground? Name of his previous band? Wedding date first wife?
10 October 1971 The Dark Knight Rises Stamford Bridge The Yardbirds 19 June 1993

How was ConvQuestions created?

ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. They are compiled from the inputs of 70 Master crowdworkers on Amazon Mechanical Turk, with conversations from five domains: Books, Movies, Soccer, Music, and TV Series. The questions feature a variety of complex question phenomena like comparisons, aggregations, compositionality, and temporal reasoning. Answers are grounded in Wikidata entities to enable fair comparison across diverse methods. The data gathering setup was kept as natural as possible, with the annotators selecting entities of their choice from each of the five domains, and formulating the entire conversation in one session. All questions in a conversation are from the same Turker, who also provided gold answers to the questions. For suitability to knowledge graphs, questions were constrained to be objective or factoid in nature, but no other restrictive guidelines were set. A notable property of ConvQuestions is that several questions are not answerable by Wikidata alone (as of September 2019), but the required facts can, for example, be found in the open Web or in Wikipedia. For details, please refer to our CIKM 2019 full paper.

CONVEX: A Baseline Method

We also provide CONVEX, an unsupervised method that can answer incomplete questions over a knowledge graph (Wikidata in our case) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. Again, please refer to the paper for the details.
GitHub link to CONVEX code


For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mmci DOT uni HYPHEN saarland DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit


Please click on the buttons below to load sample conversations:


"Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion", Philipp Christmann, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum, in Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 (CIKM '19), Beijing, China, 3 - 7 November 2019, pages 729-738.
[Code] [Slides] [Poster] [BibTeX]


Model MRR P@1
CONVEX with Frontiers
Christmann et al. '19
0.200 0.184
Star Model
0.175 0.175
CONVEX without Frontiers
0.101 0.084
Chain Model 0.075 0.075
D2A (trained on ConvQuestions)
Guo et al. '18
0.061 0.061
D2A (trained on CSQA)
Guo et al. '18
0.059 0.059

If you want to run CONVEX on your own conversational questions' dataset over Wikidata, please send it to us by email (without gold answers if you wish). We will get back to you at the earliest with the outputs.