본문 바로가기
  • Machine Learning Paper Reviews (Mostly NLP)
Task Specific Research

KMMLU: A Korean Benchmark for LLMs

by wlqmfl 2024. 3. 29.
KMMLU: Measuring Massive Multitask Language Understanding in Korean

Guijin Son, Hanwool Lee, Sungdong Kim, etc.

18 Feb 2024

 

MMLU

There exist various benchmarks for evaluating and understanding the capabilities of Large Language Models (LLMs), such as commonsense reasoning, code generation, and multi-turn conversations. Massive Multitask Language Understanding (MMLU) is one of these benchmarks, covering a wide range of knowledge-based topics through an expansive dataset.

 

KMMLU

Naively translating the MMLU into Korean has a lot of critical limitations. Most importantly, English slangs and cultural phrases aren't perfectly translated to what Korean would understand. Thus, this paper introduces KMMLU, a comprehensive Korean benchmark consisting of 35,030 expert-level multiple choice questions spanning 45 subjects, and tested over 26 proprietary LLMs. This benchmark includes questions related to HUMSS, STEM, Applied Science, and other professional-level knowledge, which require an understanding of cultural, regional, and legal fields.

 

Evalutation Results

Although there aren't any features utterly devoid of attribution to the evaluation results, there were three crucial features that the paper presented. These features stood out because they criticized the Korean representative LLM, POLYGLOT-KO-12.8B.

  • Pretraining Compute

    Larger computing or training resources lead to superior performance. Surprisingly, multilingual pretrained models such as LLAMA-2, QWEN, and Yi outperform the POLYGLOT-KO-12.8B (= trained exclusively in Korean).

  • Fine-Tuning

    Fine-tuning on top of pretrained model isn't necessary.

  • No Curse of Multilinguality at scale

    When an LLM is trained on a multilingual corpora, it exhibits noticable performance degradation. This is called the "curse of multilinguality (Conneau et al., 2019; Pfeiffer et al., 2022)". Nevertheless, the paper couldn't find any evidence of this when evaluating LLMs with KMMLU.

 

Long-Tail AI tasks

Long tailed AI tasks referred to as the tasks which are relatively less researched due to the lack of resources. In the natural language processing field, one of the famous long tailed AI tasks is research related to minor languages such as Korean. KMMLU is also one of these tasks that addresses necessary but often veiled problems in Korea. I am always happy to read and share this kind of paper which can enlighten the finest part of our language.