Methodology of evaluating duplicate question detection for a GenAI based chatbot service
- Project
- 21016 DAIsy
- Type
- Enhancement
- Description
The method assesses question caching by matching new queries to a question database to determine whether a validated answer can be safely reused. Duplicate detection is strict, requiring exact intent equivalence; any change in intent or detail is treated as a non-match to protect patient safety. Evaluation uses a synthetic dataset from existing questions, with positive and negative duplicate candidates generated via multiple methods. Performance is measured with MRR, Hit@K, and precision/recall, plus review to judge retrieval quality and reliability, noting label subjectivity as noise.
- Contact
- Martijn Krans
- martijn.krans@philips.com
- Research area(s)
- Generative AI and LLMs
- Technical features
The system uses a two-stage pipeline. A bi-encoder–based semantic search using all-MiniLM-L6-v2 retrieves candidate questions by computing cosine similarity between embeddings. Final duplicate detection is performed via an LLM prompt, which assigns a discrete score (1–5) to a pair of questions. Only exact matches (score 5) are accepted for answer reuse. In our internal evaluation, this prompt-based approach outperforms fixed cosine-similarity-threshold baselines, which is particularly important from a safety standpoint.
- Integration constraints
Solutions that use LLMs
- Targeted customer(s)
Philips and any industry developing chatbots
- Conditions for reuse
Originally to be used internally, licensing to be considered
- Confidentiality
- Public
- Publication date
- 27-01-2026
- Involved partners
- Philips Electronics Nederland BV (NLD)