LLM Evaluation Benchmark for Chinese Language Teaching (CLTE)

A comprehensive benchmark for evaluating large language models' capabilities as Chinese language teachers, consisting of three core evaluation dimensions.

Evaluation Framework

GitHub URL: https://github.com/Line-Kite/CLTE

Task Overview

Task 1: Basic Knowledge Evaluation

  • Objective: Assess foundational knowledge essential for international Chinese education
  • Coverage: 32 sub-topics across 5 major categories:
    • Linguistics (307 questions)
    • Chinese Culture (321 questions)
    • Pedagogy (163 questions)
    • World Culture (192 questions)
    • Cross-cultural Communication (217 questions)
  • Total: 1,200 questions evaluating fundamental knowledge base

Task 2: International Teacher Examination

  • Objective: Evaluate comprehensive teaching literacy using authentic certification materials
  • Data Source: Real-world test questions from official International Chinese Language Teacher Certification exams
  • Format: Instructional passages accompanied by 2-10 single-choice questions (1,044 total questions)
  • Focus: Integrated linguistic and pedagogical reasoning in practical teaching scenarios

Task 3: Teaching Practice Evaluation

  • Objective: Measure instructional effectiveness through simulated teaching interactions
  • Methodology:
    • Teacher models generate educational content from 120 teaching materials and guidelines
    • Student models are tested before and after receiving instruction
    • Effectiveness measured by performance improvement (120 assessment questions)

Citation

Please cite our paper if the work helps you.

@inproceedings{xu2025can,
  title={Can Large Language Models Be Good Language Teachers?},
  author={Xu, LiQing and Li, Qiwei and Peng, Tianshuo and Li, Zuchao and Zhao, Hai and Wang, Ping},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={23968--23982},
  year={2025}
}

license: cc-by-4.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support