About

Project Overview

LT4CPR is a collaborative research project building the language technology infrastructure needed for effective crisis preparedness and response. Today's language technologies (machine translation, speech recognition, summarization, text classification) are powerful, but they serve only a fraction of the world's languages. The communities most vulnerable to disasters are often precisely those whose languages are least supported.

This project addresses that gap by creating two complementary corpora: speech translation datasets for under-resourced languages (led by George Mason University) and social media datasets for situation report generation (led by the University of Washington). Together, these resources form a shared infrastructure that enables NLP researchers, humanitarian organizations, and crisis responders to build and evaluate language tools that work when and where they are most needed.

Intellectual Merit

The project advances natural language processing by producing datasets grounded in real crisis scenarios and under-resourced languages. It enables research in machine translation, speech translation, summarization, and classification in linguistically diverse settings. It also conducts systematic reviews of existing language technology infrastructure for crisis response, identifying gaps and guiding future research directions.

Broader Impacts

Improved language technologies for crisis response have direct humanitarian value. Responders can triage and translate messages faster; affected communities can access information in their own language; aid agencies can communicate more effectively across language barriers. The project explicitly targets languages spoken by communities most at risk but least served by current tools, and works in partnership with NGOs that operate on the ground in crisis settings.

The Collaboration

This is an NSF Collaborative Research grant. Two institutions each hold a separate NSF award that together constitute one unified project. George Mason University and the University of Washington contribute complementary expertise: speech and low-resource NLP (GMU) and social media NLP and multilingual systems (UW).

	George Mason University	University of Washington
Award number	2346334	2346335
PI / Co-PIs	Antonios Anastasopoulos	Fei Xia, William D. Lewis
NSF award page	NSF #2346334	NSF #2346335

Grant period: October 1, 2024 to September 30, 2027 · NSF Program: CCRI (CISE Community Research Infrastructure)

NSF Abstract

Language technologies are promising and could have strong impact during disaster responses. They can help to triage text messages in a disaster to determine what aid to provide. Language technologies can translate vast amounts of data related to an ongoing pandemic. Responders can use these technologies to converse with victims during disaster responses. However, advances in language technologies to date are limited. They focus on a few dozen of the more than 6,500 languages spoken or signed in the world today. Current language technologies neglect millions of people. This especially impacts those who are most at risk for experiencing disasters. This project provides an infrastructure for language technology advancements for crisis response. The results will be useful for everyone, no matter the language they speak.

This project builds datasets of crisis communications using dedicated data collections and social media harvesting. These datasets will be applicable to curated crisis scenarios. They will use common language scenarios necessary to communicate with vulnerable populations. This approach helps people for whom language technologies are not typically developed. The project will bring together researchers from different disciplines. These include language technology researchers, experts in disaster relief, linguistics, and human-computer interaction. The project will target representatives from the local speech communities to take part. To coordinate this effort, the project will organize yearly workshops and shared tasks with the communities.