KMMLU: Measuring Massive Multitask Language Understanding in Korean
Guijin Son, Hanwool Lee, Sungdong Kim, etc.
18 Feb 2024
MMLU
There exist various benchmarks for evaluating and understanding the capabilities of Large Language Models (LLMs), such as commonsense reasoning, code generation, and multi-turn conversations. Massive Multitask Language Understanding (MMLU) is one of these benchmarks, covering a wide range of knowledge-based topics through an expansive dataset.
KMMLU
Naively translating the MMLU into Korean has a lot of critical limitations. Most importantly, English slangs and cultural phrases aren't perfectly translated to what Korean would understand. Thus, this paper introduces KMMLU, a comprehensive Korean benchmark consisting of 35,030 expert-level multiple choice questions spanning 45 subjects, and tested over 26 proprietary LLMs. This benchmark includes questions related to HUMSS, STEM, Applied Science, and other professional-level knowledge, which require an understanding of cultural, regional, and legal fields.
Evalutation Results
Although there aren't any features utterly devoid of attribution to the evaluation results, there were three crucial features that the paper presented. These features stood out because they criticized the Korean representative LLM, POLYGLOT-KO-12.8B.
- Pretraining Compute
Larger computing or training resources lead to superior performance. Surprisingly, multilingual pretrained models such as LLAMA-2, QWEN, and Yi outperform the POLYGLOT-KO-12.8B (= trained exclusively in Korean).
- Fine-Tuning
Fine-tuning on top of pretrained model isn't necessary.
- No Curse of Multilinguality at scale
When an LLM is trained on a multilingual corpora, it exhibits noticable performance degradation. This is called the "curse of multilinguality (Conneau et al., 2019; Pfeiffer et al., 2022)". Nevertheless, the paper couldn't find any evidence of this when evaluating LLMs with KMMLU.
Long-Tail AI tasks
Long tailed AI tasks referred to as the tasks which are relatively less researched due to the lack of resources. In the natural language processing field, one of the famous long tailed AI tasks is research related to minor languages such as Korean. KMMLU is also one of these tasks that addresses necessary but often veiled problems in Korea. I am always happy to read and share this kind of paper which can enlighten the finest part of our language.
'Task Specific Research' 카테고리의 다른 글
Detecting Whether A Text is Written in GPT (0) | 2023.02.27 |
---|---|
LUKE: Language Understanding with Knowledge-Based Embeddings (0) | 2023.02.04 |
A Paradigm Shift for Non-english(Korean) Language Processing (0) | 2023.01.17 |