Related Resources

Paper lists, datasets, and talks here.


Paper Lists

Awesome Few-shot Learning in NLP

Link: https://github.com/zhjohnchan/awesome-few-shot-learning-in-nlp

Tags: Few-Shot

Intro: A collection of few-shot learning research papers published in 2019-2021.

Awesome Meta Learning

Link: https://github.com/sudharsan13296/Awesome-Meta-Learning

Tags: Meta Learning, MAML, Few-Shot, Zero-Shot

Intro: A curated list of Meta Learning papers, code, books, blogs, videos, datasets and other resources.

Few-shot Learning for NLP - Papers

Link: https://github.com/Duan-JM/awesome-papers-fewshot

Tags: Few-Shot, Pre-trained Model, Prompt-based method, Dialogue

Intro: This repo focuses on collecting papers published on top conferences in few-shot learning area and separates papers into different files according to their categories.

Few-shot Learning Literature

Link: https://github.com/wutong8023/Awesome_Few_Shot_Learning

Tags: Few-Shot, Continual Learning, Information Extraction

Intro: A collection of few-shot learning literature categorized by the Published Venue.

Meta Learning for NLP - Papers

Link: https://github.com/ha-lins/MetaLearning4NLP-Papers

Tags: Meta Learning, Fundamental NLP Tasks, Dialog System

Intro: A list of Meta Learning papers classified by application scenarios.

Datasets

CLUES

Link: https://github.com/microsoft/CLUES

Intro: CLUES is a benchmark for evaluating the few-shot learning capabilities of NLU models. Tasks on CLUES have 3 distinct categories, classification (SST-2, MNLI), sequence labeling (SQuADv2, ReCoRD), and machine reading comprehension (CoNLL03, WikiANN). The benchmark does not include a separate validation set for any task to establish a true few-shot learning setting and it randomly samples labeled examples from related datasets.

CrossFit

Link: https://github.com/INK-USC/CrossFit

Intro: CrossFit is a Few-shot challenge of 160 different NLP tasks collected from existing open-access datasets. All tasks in this collection are transformed into a text-to-text format. There are four general types of the included datasets, which are: classification (sentiment classification, paraphrase identification, natural language inference, etc.), question answering (reading comprehension, multi-choice QA, closed-book QA), conditional generation (summarization, dialogue), and others.

FewCLUE

Link: https://github.com/CLUEbenchmark/FewCLUE

Intro: FewCLUE is a comprehensive small sample evaluation benchmark in Chinese. The benchmark consists of a collection of nine different natural language understanding tasks, including single-sentence classification tasks, sentence-pair classification tasks, and machine reading comprehension tasks. The benchmark adopts multiple sampling strategies to generate samples from original datasets to reflect real scenarios.

FewGLUE

Link: https://github.com/timoschick/fewglue

Intro: FewGLUE is a few-shot natural language understanding benchmark. The benchmark is based on SuperGLUE, consisting of a random selection of 32 training examples from SuperGLUE training sets using a fixed random seed. The benchmark additionally creates sets of up to 20,000 unlabeled examples for each SuperGLUE task, by removing all labels from the original training sets.

FEWSHOTWOZ

Link: https://github.com/pengbaolin/SC-GPT

Intro: FEWSHOTWOZ is a benchmark that simulates the few-shot adaptation setting in task-oriented dialog systems. The benchmark consists of dialog utterances from 7 domains, including Restaurant, Hotel, Laptop, TV, Attraction, Taxi, and Train. For each domain, FEWSHOTWOZ provides less than 50 labeled utterances for fine-tuning. Besides, FEWSHOTWOZ has only 8.82% overlap among different from other domains. The NLG task defined on FEWSHOTWOZ requires the models to learn to generalize over new compositions of intents.

FLEX

Link: https://github.com/allenai/flex

Intro: FLEX is a few-shot learning NLP benchmark that contains four few-shot settings and zero-shot evaluation across 20 NLP datasets (natural language inference, relation classification, entity typing, etc.). FLEX could evaluate four transfer learning settings, including class transfer, domain transfer, task transfer, and pretraining transfer. The suite also contain sampling tools that could create episodes with class imbalance.

RAFT

Link: https://raft.elicit.org/

Intro: RAFT is a few-shot classification benchmark that focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. The benchmark tests language models' ability across multiple domains, including medical data, tweets, customer interaction, etc. Classification tasks on RAFT are economically valuable. The benchmark consists of 11 datasets and the training set for each task has 50 examples and a larger unlabeled test set.

Talks

Zero- and Few-Shot NLP with Pretrained Language Models

Link: https://github.com/allenai/acl2022-zerofewshot-tutorial

Intro: The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult. This is a challenging setting both academically and practically---particularly because training neutral models typically require large amount of labeled data. More recently, advances in pretraining on unlabelled data have brought up the potential of better zero-shot or few-shot learning (Devlin et al., 2019; Brown et al., 2020). In particular, over the past year, a great deal of research has been conducted to better learn from limited data using large-scale language models. In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for zero- and few-shot learning with pretrained language models. Additionally, our goal is to reveal new research opportunities to the audience, which will hopefully bring us closer to address existing challenges in this domain.