ENLSP NeurIPS Workshop 2022

The 2^nd workshop on
Efficient Natural Language and Speech Processing (ENLSP)

Friday Dec. 2nd 2022, New Orleans
In-person (Ballroom C) and Virtual

The latest version of the worshop (NeurIPS ENLSP 2024) is out, you can check it on the new website.

The second version of the Efficient Natural Language and Speech Processing (ENLSP-II) workshop focuses on fundamental and challenging problems to make natural language and speech processing (especially pre-trained models) more efficient in terms of Data, Model, Training, and Inference. The workshop program offers an interactive platform for gathering different experts and talents from academia and industry through invited talks, panel discussion, paper submissions, reviews, interactive posters, oral presentations and a mentorship program. This will be a unique opportunity to address the efficiency issues of current models, build connections, exchange ideas and brainstorm solutions, and foster future collaborations. The topics of this workshop can be of interest for people working on general machine learning, deep learning, optimization, theory and NLP & Speech applications.

Overview

Pre-training a general model using self-supervised learning on huge amount of data and then fine-tuning that model on a specific task has become a generic paradigm in solving many natural language and speech processing tasks. Since then, we have had different types of pre-trained models (e.g. encoder-only such as BERT, decoder-only such as GPT, encoder-decoder such as T5) in very diverse range of scales (from millions to more than 500 billion parameters) for different tasks.

There has been a common practice in the literature to increase the number of parameters of these pre-trained models to improve their performance or their zero/few-shot abilities. Despite the great success of these pre-trained models, it is evident that most of them are largely over-parameterized and their efficiency is under question. Training or deploying these models on devices or even cloud services with limited memory and computational power can be very expensive and challenging. For example, Megatron-Turing with 530B parameters has shown state-of-the-art results in many NLP tasks, but at the cost of using 560 DGX A100 nodes (more than 4000 NVIDIA A100) for training and using more than 300B tokens data. Moreover, delivering such huge models as a service to different clients will require different copies of the model for different tasks. Even fine-tuning the entire large model over a small labeled dataset can lead to overfitting. Therefore, it is of vital importance to invest on future of pre-trained models by enhancing their efficiency in terms of data, modeling, training and inference from different perspectives highlighted in this workshop.

Call for Papers

We would like to share some fundamental challenges on improving efficiency of pre- trained models and encourage the NeurIPS community to submit their solutions, ideas, and ongoing work concerning data, model, training, and inference efficiency for NLP and speech processing. The scope of this workshop includes, but not limited to, the following topics:

Efficient Pre-Training Pre-training is a very expensive process. Even a small modification to the configuration of the models requires the user to redo pre-training.

Accelerating the pre-training process
Continual/Life-long pre-training and adapting pre-trained models to a new domain
Efficient initialization and hyper-parameter tuning (HPT)
Better pre-training self-supervised objectives
Multi-domain pre-training
Data vs. Scale of pre-trained models
Pre-training Multimodal (e.g., text–speech) models
New efficient architectures for pre-trained models

Efficient Fine-tuning Fine-tuning large pre-trained models on downstream tasks can be challenging because pre-trained models are very over-parameterized.

Parameter-efficient tuning solutions to tune only a portion of the entire network (e.g. adapters)
Efficient prompt-based fine-tuning
Accelerating the fine-tuning process (e.g. optimizer, and layer-skipping)
Efficient federated learning for NLP: reduce the communication costs, tackling heterogeneous data, heterogeneous models.

Data Efficiency Pre-trained models rely on a huge amount of unlabeled data which makes the training very sample inefficient.

Sample efficient training, training with less data, few-shot and zero-shot learning
Sample efficient data-augmentation, identifying which training samples should be augmented
Data compression, data distillation
Data selection, how to improve the quality of pre-training data

Inference Efficiency How can we reduce the inference time or memory footprint of a trained model for a particular task?

Neural model compression techniques such as quantization, pruning, layer decomposition and knowledge distillation (KD) for NLP and Speech
Impact of different compression techniques on the inductive biases learned by the original models
Combined compression techniques for more efficient NLP and speech models
Improving efficiency of KD by removing the teacher
Extreme model compression (high compression ratio) for very large pre-trained language models

Special Track) Efficient Graph Learning for NLP

Automatically transforming natural language into graph-structured data
Representation learning on multi-relational or heterogeneous graphs
Learning the mapping between complex data structures, like Graph2Seq, Graph2Tree, Graph2Graph
Graph learning with pre-trained language models

Other Efficient Applications Pre-trained models are used in many tasks in NLP that efficiency can be their concern.

Efficient Dense Retrieval
Large language model as a service
Training models on device
Incorporating external knowledge into pre-trained models
Unifying different pre-training models

Submission Instructions

You are invited to submit your papers in our CMT submission portal. All the submitted papers have to be anonymous for double-blind review. We expect each paper will be reviewed by at least three reviewers. The content of the paper (excluding the references and supplementary materials) should not be longer than 4 pages, strictly following the NeurIPS template style (which can be found here).

Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes. According to the guideline of the NeurIPS workshops, already published papers are not encouraged for submission, but you are allowed to submit your ArXiv papers or the ones which are under submission. Moreover, a work that is presented at the main NeurIPS conference should not appear in a workshop. Please make sure to indicate the complete list of conflict of interests for all the authors of your paper. To encourage higher quality submissions, our sponsors are offering the Best Paper and the Best Poster Award to qualified outstanding original oral and poster presentations (upon nomination of the reviewers). Also, we will give one outstading paper certification for our special track of efficient graph learning for NLP.Bear in mind that our workshop is not archival, but the accepted papers will be hosted on the workshop website.

Important Dates:

Submission Deadline: ~~September 25, 2022 AOE~~
Acceptance Notification: ~~October 20, 2022 AOE~~
Camera-Ready Submission: ~~November 1, 2022 AOE~~
Workshop Date: Friday December 2, 2022 (in-person and virtual)

Confirmed Speakers

Dr.
Tara Sainath
Google

Prof.
Graham Neubig
Carnegie Mellon University

Prof.
Jimmy Lin
University of Waterloo

Prof.
Song Han
MIT

Prof.
Danqi Chen
Princeton University

Prof.
You Yang
University of Singapore

Dr.
Lu Hou
Huawei Noah's Ark Lab

Prof.
Bang Liu
University of Montreal / MILA

Prof.
Siva Reddy
McGill & MILA

Tim Dettmers
University of Washington

Prof.
Kenneth Heafield
University of Edinburg

Prof.
Anna Huang
MILA / Google

Industrial Panelists

Mohammad Norouzi
Google Brain

Vikrant Singh Tomar
Fluent.AI

Rahul Gupta
Amazon Alexa

Boxing Chen

Marjan Ghazvininejad
Meta

Yu Cheng
Microsoft

Jiahao Sun
RBC

Schedule (New Orleans Time Zone)

Title: (KeyNote Talk) Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Presenter: You Yang
Bio

Yang You is a Presidential Young Professor at National University of Singapore. He is on an early career track at NUS for exceptional young academic talents with great potential to excel. He received his PhD in Computer Science from UC Berkeley. His advisor is Prof. James Demmel, who was the former chair of the Computer Science Division and EECS Department. Yang You's research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. The focus of his current research is scaling up deep neural networks training on distributed systems or supercomputers. In 2017, his team broke the world record of ImageNet training speed, which was covered by the technology media like NSF, ScienceDaily, Science NewsLine, and i-programmer. In 2019, his team broke the world record of BERT training speed. The BERT training techniques have been used by many tech giants like Google, Microsoft, and NVIDIA. Yang You’s LARS and LAMB optimizers are available in industry benchmark MLPerf. He is a winner of IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%) and ACM/IEEE George Michael HPC Fellowship. Yang You is a Siebel Scholar and a winner of Lotfi A. Zadeh Prize. Yang You was nominated by UC Berkeley for ACM Doctoral Dissertation Award (2 out of 81 Berkeley EECS PhD students graduated in 2020). He also made Forbes 30 Under 30 Asia list (2021) and won IEEE CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing.

Abstract

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. To solve this problem, we introduce Colossal-AI, which is a unified parallel training system designed to seamlessly integrate different paradigms of parallelization techniques including data parallelism, pipeline parallelism, multiple tensor parallelism, and sequence parallelism. Colossal-AI aims to support the AI community to write distributed models in the same way as how they write models normally. This allows them to focus on developing the model architecture and separates the concerns of distributed training from the development process. Colossal-AI is able to achieve 2x speedup over state-of-the-art distributed systems for GPT model training.

Title: (KeyNote Talk) Efficient Controllable Generative Models for Music and Performance Synthesis
Presenter: Anna Huang
Bio

Anna Huang is a Research Scientist at Google Brain, working on the Magenta project. She is also a Canada CIFAR AI Chair at Mila – Québec AI Institute, and an Adjunct Professor at Université de Montréal. Her research focuses on designing generative models and interfaces to support music making and more generally the creative process. Her work is at the intersection of machine learning, human-computer interaction, and music. She is the creator of Music Transformer and Coconet. Coconet was the ML model that powered Google’s first AI Doodle, the Bach Doodle, in two days enabling tens of millions of users around the world to co-compose with ML in their browser. She is an organizer of the international AI Song Contest, and was a guest editor for TISMIR's special issue on AI and Musical Creativity.

Abstract

How can we design generative models with structure that both improve the efficiency of models and controllability for users? In this talk, I'll give two examples to illustrate how we could achieve this goal by taking inspiration from the nonlinear and hierarchical structure that underlies the human process of creating music. Generative models of music composition typically assume music is written in a single pass from beginning to end, constraining the user to also follow this unnatural chronological process. To enable a more nonlinear creative workflow, we introduced Coconet (Huang et al., 2017) an Orderless NADE (Uria et al., 2014) like generative model (similar to masked language and visual models) that models all permutations of orderings of breaking down the task of composition. This enables both the model to learn more efficiently from data sequences by traversing it from all directions, and users to put down notes in any order and have the model complete any partial score. Neural audio synthesizers typically synthesize musical performance audio from MIDI end-to-end, resulting in a blackbox that offers few mechanisms for control. To enable detailed user control, we introduced MIDI-DDSP (Wu et al., 2022), a hierarchical model of musical performance synthesis, that breaks down audio synthesis into a three-level hierarchy of notes, performance, and synthesis, analogous to how a creative process involves composers, performers and instruments. Not only does this interpretable hierarchy allow users to intervene at each level or utilize trained priors (performance given notes, synthesis given performance) for creative assistance, it also allows models to leverage these inductive biases to learn more efficiently from data, making it possible to train high-fidelity performance synthesis models from only a few hours of recordings. We hope these examples might encourage researchers to partner with creative practitioners to innovate in modeling, interaction, and human-ai co-creativity. We could see the goal as not only designing generative models that can model and generate creative artifacts well, but also working towards generative agents that we can coordinate and collaborate with in a creative setting.

Time	Title	Presenter
07:30AM - 07:50AM	Breakfast
07:50AM - 08:00AM	Opening Speech
08:00AM - 08:30AM	(KeyNote Talk) Fine-grained Interactive Vision Language Pre-training	Lu Hou
08:30AM - 09:05AM	(KeyNote Talk) Efficiency Tradeoffs in the Design of Neural Search Systems	Jimmy Lin
09:05AM - 09:35AM	(KeyNote Talk) Last Advances in End-to-End Speech Recognition	Tara Sainath
09:35AM - 09:45AM	Collective Knowledge Graph Completion with Mutual Knowledge Distillation	Weihang Zhang Ovidiu Serban Jiahao Sun Yike Guo
09:45AM - 09:56AM	Attribute Controlled Dialogue Prompting	Runcheng Liu Ahmad Rashid Ivan Kobyzev Mehdi Rezaghoizadeh Pascal Poupart
09:56AM - 10:05AM	Fast DistilBERT on CPUs	Haihao Shen Ofir Zafrir Bo Dong Hengyu Meng Xinyu Ye Zhe Wang Yi Ding Hanwen Chang Guy Boudoukh Moshe Wasserblat
10:00AM - 10:30AM	Morning Break and Poster Session 1
10:30AM - 11:05AM	(KeyNote Talk) SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models	Song Han
11:05AM - 11:35AM	(KeyNote Talk) Building Language Models Based on Retrieval	Danqi Chen
11:35AM - 12:05PM	(KeyNote Talk) Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training	You Yang
12:05PM - 12:15PM	Efficient Few-Shot Learning Without Prompts	Oren Pereg Daniel Korat Moshe Wasserblat Lewis Tunstall Unso Eun Seo Jo Luke Bates Nils Reimers
12:15PM - 12:25PM	PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation	Jingyu Zhang Jim Glass Tianxing He
12:25PM - 12:35PM	PromptDA: Label-guided Data Augmentation for Prompt-based Few Shot Learners	Canyu Chen Kai Shu
12:30PM - 01:30PM	Lunch Break and Virtual Poster Session
01:30PM - 02:00PM	(KeyNote Talk) Efficient Identify Event Causality with Knowledge and Analogy	Bang Liu
02:00PM - 02:50PM	Interactive Industrial Panel	Boxing Chen Jiahao Sun Vikrant Singh Tomar Marjan Ghazvininejad Yu Cheng Mohammad Norouzi Rahul Gupta
02:50PM - 02:59PM	Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement	Heitor R Guimarães Arthur S Pimentel Anderson ARA R. Avila Mehdi Rezagholizadeh Tiago H Falk
02:59PM - 03:05PM	Gradient Knowledge Distillation for Pre-trained Language Models	Lean Wang Lei Li Xu Sun
03:00PM - 03:30PM	Break and Poster Session II
03:30PM - 04:05PM	(KeyNote Talk) Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval	Graham Neubig
04:05PM - 04:35PM	(KeyNote Talk) Do we still need inductive biases after Transformer language models	Siva Reddy
04:35PM - 05:05PM	(KeyNote Talk) 8-bit Methods for Efficient Deep Learning	Tim Dettmers
05:05PM - 05:35PM	(KeyNote Talk) Efficient Controllable Generative Models for Music and Performance Synthesis	Anna Huang
05:35PM - 05:45PM	Best Paper and Poster Award & Closing

Organizers

Mehdi Rezagholizadeh
Huawei Noah's Ark Lab

Peyman Passban
BenchSci

Yue Dong
University of California

Lili Mou
University of Alberta

Pascal Poupart
University of Waterloo

Ali Ghodsi
University of Waterloo

Qun Liu
Huawei Noah's Ark Lab

Volunteers

Khalil Bibi
Huawei Noah's Ark Lab

Soheila Samiee
BASF

Technical Committee

Kevin Duh (Johns Hopkins University)
Boxing Chen
Vahid Partovi Nia (Huawei Noah’s Ark Lab)
Bang Liu (University of Montreal (UdM))
Hamidreza Mahyar (McMaster University)
Wenhu Chen (University of Waterloo)
Mehdi Rezagholizadeh (Huawei Noah’s Ark Lab)
Yingxue Zhang (Huawei Noah's Ark Lab)
Yue Dong (University of California)
Lili Mou (University of Alberta)
Peyman Passban (BenchSci)
Ivan Kobyzev (Huawei Noah’s Ark Lab)
Aref Jafari (University of Waterloo)
Ahmad Rashid (Huawei Noah’s Ark Lab)
Vasileios Lioutas (University of British Colombia (UBC))
Anderson R. Avila (Huawei Noah’s Ark Lab)
Malik H. Altakrori (McGill University & MILA)
Ali Vahdat (Thomson Reuters)
Prasanna Parthasarathi (McGill University & MILA)
Shohreh Shaghaghian (Thomson Reuters)
Ehsan Kamalloo (University of Alberta)
Ali Saheb Pasand (University of Waterloo)
Abbas Ghaddar (Huawei Noah’s Ark Lab)
Marzieh Tahaei (Huawei Noah’s Ark Lab)
Soheila Samiee (BASF)
Habib Hajimolahoseini (Huawei Noah’s Ark Lab)
Mohammad Salameh (Huawei Noah’s Ark Lab)
Mohammed Senoussaoui (INRS)
Flávio Ávila (Amazon)

Peng Lu (Huawei Noah’s Ark Lab)
Joao Monteiro (Service Now)
Xiaoguang Li (Huawei Noah’s Ark Lab)
David Alfonso Hermelo (Huawei Noah’s Ark Lab)
Khalil Bibi (Huawei Noah’s Ark Lab)
Can Liu (Amazon Alexa AI)
Amina Shabbeer (Amazon)
M. Skylar Versage (Amazon)
Tanya Roosta (Amazon)
Prashanth Rao (Royal Bank of Canada)
Ankur Agarwal (Huawei Noah's Ark Lab)
Sunyam Bagga (Huawei Noah’s Ark Lab)
Ovidiu Serban (Imperial College London)
Tony Tong (Royal Bank of Canada)
Jiahao Sun (Royal Bank of Canada)
Ryan Ong (Imperial College London)
Weihang Zhang (Imperial College London)
Manying Zhang (Institut National des Langues et Civilisations Orientales)
Lianlong Wu (Oxford University)
Mojtaba Valipour (University of Waterloo)
Chandra Bhagavatula (Allen Institute for AI)
Mahdi Biparva (Huawei Noah's Ark Lab)
Jinming Zhao (Monash University)
Khalil Slimi (ServiceNow)
Mohammadreza Tayaranian (Huawei Noah’s Ark Lab)
Alireza Ghaffari (Huawei Noah’s Ark Lab)
Weiyi Lu (Amazon)

The 2nd workshop onEfficient Natural Language and Speech Processing (ENLSP)

Friday Dec. 2nd 2022, New Orleans In-person (Ballroom C) and Virtual

Overview

Call for Papers

Submission Instructions

Important Dates:

Confirmed Speakers

Dr.Tara Sainath Google

Prof.Graham Neubig Carnegie Mellon University

Prof.Jimmy Lin University of Waterloo

Prof.Song Han MIT

Prof.Danqi Chen Princeton University

Prof.You Yang University of Singapore

Dr.Lu Hou Huawei Noah's Ark Lab

Prof.Bang Liu University of Montreal / MILA

Prof.Siva Reddy McGill & MILA

Tim Dettmers University of Washington

Prof.Kenneth Heafield University of Edinburg

Prof.Anna Huang MILA / Google

Industrial Panelists

Mohammad Norouzi Google Brain

Vikrant Singh Tomar Fluent.AI

Rahul Gupta Amazon Alexa