Methodology of evaluating duplicate question detection for a GenAI based chatbot service

Project: 21016 DAIsy
Type: Enhancement
Description: The method assesses question caching by matching new queries to a question database to determine whether a validated answer can be safely reused. Duplicate detection is strict, requiring exact intent equivalence; any change in intent or detail is treated as a non-match to protect patient safety. Evaluation uses a synthetic dataset from existing questions, with positive and negative duplicate candidates generated via multiple methods. Performance is measured with MRR, Hit@K, and precision/recall, plus review to judge retrieval quality and reliability, noting label subjectivity as noise.
Contact: Martijn Krans
Email: martijn.krans@philips.com
Research area(s): Generative AI and LLMs
Technical features: The system uses a two-stage pipeline. A bi-encoder–based semantic search using all-MiniLM-L6-v2 retrieves candidate questions by computing cosine similarity between embeddings. Final duplicate detection is performed via an LLM prompt, which assigns a discrete score (1–5) to a pair of questions. Only exact matches (score 5) are accepted for answer reuse. In our internal evaluation, this prompt-based approach outperforms fixed cosine-similarity-threshold baselines, which is particularly important from a safety standpoint.
Integration constraints: Solutions that use LLMs
Targeted customer(s): Philips and any industry developing chatbots
Conditions for reuse: Originally to be used internally, licensing to be considered
Confidentiality: Public
Publication date: 27-01-2026
Involved partners: Philips Electronics Nederland BV (NLD)