LLM Evaluation Benchmark for Chinese Language Teaching (CLTE)
A comprehensive benchmark for evaluating large language models' capabilities as Chinese language teachers, consisting of three core evaluation dimensions.
Evaluation Framework
GitHub URL: https://github.com/Line-Kite/CLTE
Task Overview
Task 1: Basic Knowledge Evaluation
- Objective: Assess foundational knowledge essential for international Chinese education
- Coverage: 32 sub-topics across 5 major categories:
- Linguistics (307 questions)
- Chinese Culture (321 questions)
- Pedagogy (163 questions)
- World Culture (192 questions)
- Cross-cultural Communication (217 questions)
- Total: 1,200 questions evaluating fundamental knowledge base
Task 2: International Teacher Examination
- Objective: Evaluate comprehensive teaching literacy using authentic certification materials
- Data Source: Real-world test questions from official International Chinese Language Teacher Certification exams
- Format: Instructional passages accompanied by 2-10 single-choice questions (1,044 total questions)
- Focus: Integrated linguistic and pedagogical reasoning in practical teaching scenarios
Task 3: Teaching Practice Evaluation
- Objective: Measure instructional effectiveness through simulated teaching interactions
- Methodology:
- Teacher models generate educational content from 120 teaching materials and guidelines
- Student models are tested before and after receiving instruction
- Effectiveness measured by performance improvement (120 assessment questions)
Citation
Please cite our paper if the work helps you.
@inproceedings{xu2025can,
title={Can Large Language Models Be Good Language Teachers?},
author={Xu, LiQing and Li, Qiwei and Peng, Tianshuo and Li, Zuchao and Zhao, Hai and Wang, Ping},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={23968--23982},
year={2025}
}
license: cc-by-4.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support