Leveraging Readily Available Data for Low Resource Natural Language Processing

Seminar

Event Start

2021-02-07 - 15:00

Event End

2021-02-07 - 16:00

Location

KAUST

Natural Language Processing

Derry Wijaya, Assistant Professor, Computer Science, Boston University

Abstract

State-of-the-art Natural Language Processing (NLP) systems nowadays are dominated by machine learning and deep learning models. However, most of these models often only work well when there are abundant labeled data for training. Furthermore, as these models typically have a large number of parameters, they require large compute resources to train. For the majority of languages in the world and the researchers working on these languages, however, abundant labeled data are a privilege and so do compute resources. How can we train generalizable NLP models that are effective even when labeled data are scarce and compute resources are limited? In this talk, I will present some of our solutions that leverage unsupervised or few-shot learning and readily available multilingual resources or multimodal data to improve machine translation and nuanced text classification such as news framing under these low resource settings.

Brief Biography

Derry Wijaya is an Assistant Professor in the Computer Science Department of Boston University. Her research focuses on extending the state-of-the-art machine learning and deep learning algorithms to problems that involve natural language data for low resource languages, which are languages for which there are few/no training data available, or no existing automated human language technologies. She is interested in building generalizable approaches for these low resource languages that leverage their monolingual data and other sources of information such as images or readily available multilingual resources. Before joining Boston University, she did her postdoctoral research at the University of Pennsylvania with Professor Chris Callison-Burch on translating words from monolingual only corpora. She received her Ph.D. from Carnegie Mellon University working with Professor Tom Mitchell in building and populating knowledge bases by extracting information from unstructured web text on the Never-Ending Language Learning (NELL) project. She received her Master's and Bachelor of Computing degrees from the National University of Singapore.

Contact Person

Peter Richtarik

Event Start

Event End

Location

Abstract

Brief Biography

Contact Person

Events

Decision-Making and Learning through Feedback in Robotics

Forward-Looking Roadmap: Enabling Heterogeneous Integration in the Next Decade

Pontryagin meets Bellman: on combining Pontryagin’s Principle and Dynamic Programming

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Computer Science Program