본문 바로가기
  • Machine Learning Paper Reviews (Mostly NLP)
Methodologies

Detour for Domain-Specific Tasks Avoiding Adaptive Pre-training

by wlqmfl 2023. 2. 7.
KALA: Knowledge-Augmented Language Model Adaptation

Minki Kang, Jinheon Baek, Sung Ju Hwang

4 Aug 2022


Abstract

 <Don't Stop Pretraining> is a paper written in 2020 which maintains that adaptive pretraining, a second phase pretraining in-domain of a Language Model, leads to performance gains for several domain specific tasks. Based on this method, recently domain-adaptive pretraining and task-adaptive pretraining have overcome the limits that Language Model which contains general information with broad coverage have against domain specific tasks such as biomedical or news. However KALA, the paper which will be introduced today, points out the problem of adaptive pretraining which (1) requires a large training cost and (2) can harm the performance on downstream tasks by causing catastrophic forgetting. Despite having higher computational efficiency, KALA (Knowledge-Augmented Language model Adaptation) outperforms adaptive pretraining by (1) modulating the intermediate hidden representations of Pretrained Language Models with domain knowledge and (2) utilizing the relational facts between the entities.

 Main goal of KALA is to inject domain specific information on top of PLM (Pretrained Language Models) while eliminating adaptive pretraining. KALA integrates the knowledge into the PLM using the method called KFM (Knowledge-conditioned Feature Modulation) which treats entities, relations, and knowledge graph as a core solution to enhance the performance on downstream tasks. KFM (1) does not modify the original architecture of PLMs, (2) only requires marginal computational and memory overhead, and (3) efficiently handles unseen entities. In contrast to the prior methods, KALA has achieved huge success on various point of view, including the performance growth.


Model
 KALA tackles the problem of PLMs have in order to solve Natural Language Understanding tasks for a specific domain. This paper uses methods which contextualizes the texts by domain knowledge, captured by the domain specific entities and their relations. The paper defines several concepts such as entity, mention: M, entity memory: E, and knowledge graph: G. As mentioned above, main method of KALA is KFM (Knowledge-conditioned Feature Modulation).

 In the middle of the layers of transformer based PLM, KFM linearly transforms the intermediate features into ones conditioned on the knowledge sources E, M, G. By injects two learnable modulation parameters which are conditioned by the entity representation, the model identifies associated tokens and embed them as a similar objects. Also, the paper proposes a method called relational retrieval to embed unseen objects where they must have been. This method, which considers the attentive scheme for neighborhood aggregation, has two advantages: (1) It captures richer interactions among entities, and (2) it naturally represents an unseen entities.

Before Machine Learning Perfectly Resembles the Brain,
 Although it outperforms state-of-the-art baselines on Question & Answering task and Named Entity Recognition task, the paper still claims that users should deeply consider the accurateness of knowledge which the model outputs. Moreover, the paper admits that the prediction performance is still far from optimal. This states that for high-risk domains such as biomedicine, people should be aware of model's prediction failure. Before machine learning perfectly resembles the brain, and outputs the exact prediction, which indicates the optimal performance, the experts should be double-checking the results.