Home
Benchmark
LingxiDiagBench
A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis
A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis
We present LingxiDiagBench, a large-scale multi-agent benchmark that evaluates LLMs on both static diagnostic inference and dynamic multi-turn psychiatric consultation in Chinese. At its core is LingxiDiag-16K, a dataset of 16,000 EMR-aligned synthetic consultation dialogues designed to reproduce real clinical demographic and diagnostic distributions across 12 ICD-10 psychiatric categories. Through extensive experiments across state-of-the-art LLMs, we establish key findings: (1) although LLMs achieve high accuracy on binary depression--anxiety classification (up to 92.3%), performance deteriorates substantially for depression--anxiety comorbidity recognition (43.0%) and 12-way differential diagnosis (28.5%); (2) dynamic consultation often underperforms static evaluation, indicating that ineffective information-gathering strategies significantly impair downstream diagnostic reasoning; (3) consultation quality assessed by LLM-as-a-Judge shows only moderate correlation with diagnostic accuracy, suggesting that well-structured questioning alone does not ensure correct diagnostic decisions.
-
技术合作
本研究以阿尔茨海默病功能连接标记物为特征,利用非平滑非负矩阵 分解算法探究了阿尔茨海默病的功能连接损伤亚型。阿尔茨海默病患者被可重复地分为4个。 -
技术合作
本研究以阿尔茨海默病功能连接标记物为特征,利用非平滑非负矩阵 分解算法探究了阿尔茨海默病的功能连接损伤亚型。阿尔茨海默病患者被可重复地分为4个。 -
技术合作
本研究以阿尔茨海默病功能连接标记物为特征,利用非平滑非负矩阵 分解算法探究了阿尔茨海默病的功能连接损伤亚型。阿尔茨海默病患者被可重复地分为4个。 -
技术合作
本研究以阿尔茨海默病功能连接标记物为特征,利用非平滑非负矩阵 分解算法探究了阿尔茨海默病的功能连接损伤亚型。阿尔茨海默病患者被可重复地分为4个。