Theories and Methods of Natural Language Processing with Fewer Labels


Current state-of-the-art natural language processing models are highly dependent on large-scale labeled data. However, the amount of labeled data for specific tasks tends to be limited due to the following characteristics of natural language processing including difficult data labeling, various task types, as well as large differences and frequent emergency among domains. Therefore, it is of great significance to study how to construct high-precision natural language processing systems based on a small amount of labeled data, i.e., learning with fewer labels (LFL). Nevertheless, the existing LFL systems are often inadequate when faced with natural language processing, since the latter also has the characteristics of knowledge-dependent understanding, symbolic representation, and diverse tasks.

This project will conduct a comprehensive and in-depth study on LFL natural language processing from levels of theory and method. Firstly, we study cognitive and machine learning related theories for LFL. Then, we propose a general pre-trained language model for LFL based on the above theories; the focus is on the efficient and robust adaptation of the model to downstream tasks, as well as knowledge transfer among downstream tasks. As a more reasonable verification of the effectiveness of various methods, we construct a new LFL natural language processing evaluation system. Consequently, the ultimate goal of this project is to significantly improve the ability of LFL natural language processing models and narrow the gap between artificial intelligence and human intelligence.

This project is supported by the National Natural Science Foundation of China (NSFC) via grant 62236004

Project period: 2023.1.1 -- 2027.12.31