Building an inclusive NLP

Check out all the Smart Security Summit on-demand sessions here.

Millions of standard English speakers enjoy the benefits offered by natural language processing (NLP) models every day.

But for speakers of African American Vernacular English (AAVE), technology such as voice-enabled GPS systems, PDAs, and text-to-speech software are often problematic, as large NLP models are often unable to understand or generate words in AAVE. . Worse, the models are often trained on data pulled from the web and are likely to embed the racial biases and stereotypical associations rampant online.

When these biased models are used by companies to help make high-stakes decisions, AAVE speakers may find themselves unfairly excluded from social networks, inappropriately denied access to housing, or to loan opportunities, or being treated unfairly by law enforcement or justice systems.

For the past 18 months, machine learning (ML) specialist Jazmia Henry has been focused on finding a way to responsibly integrate AAVE into language models. As a fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) and the Center for Comparative Studies in Race and Ethnicity (CCSRE), she created an open-source corpus of over 141,000 AAVE words to help researchers and manufacturers to design models that are both inclusive and less likely to be biased.

Event

On-Demand Smart Security Summit

Learn about the essential role of AI and ML in cybersecurity and industry-specific case studies. Watch the on-demand sessions today.

look here

"My hope with this project is that social and computational linguists, anthropologists, computer scientists, social scientists, and other researchers will push and push at this corpus, research with it, wrestle with it and test its limits so that we can turn this into a true representation of AAVE and provide feedback and information on our potential next steps algorithmically,” Henry said.

In this interview, she describes the early hurdles to the development of this database, its potential to help computational linguistics understand the origins of AAVE, and its post-Stanford projects.

How would you describe African American Vernacular English?

For me, AAVE is a language of perseverance and elevation. It is the result of African languages ​​thought to have been lost during the slave trade migration being incorporated with English to create a new language used by the descendants of these African peoples.

How did you become interested in including AAVE in NLP models?

As a child, both of my parents sometimes spoke their mother tongue. For my West Indian dad, it was Jamaican patois, and for my mom it was Gullah Geechee, found in the coastal areas of the Carolinas and Georgia. Each language was a creole, which is a new language created by mixing different languages.

Everyone seemed to understand that my parents spoke a different language, and no one doubted their intelligence. But when I saw people in my community speaking AAVE, which I believe to be another Creole language, I could tell there was a shame and stigma associated with it - a feeling that if we used that language on the outside, we were going to be judged as less intelligent. When I started working in data science, I wondered what would happen if I tried to collect data on AAVE and incorporate it into

Building an inclusive NLP

Check out all the Smart Security Summit on-demand sessions here.

Millions of standard English speakers enjoy the benefits offered by natural language processing (NLP) models every day.

But for speakers of African American Vernacular English (AAVE), technology such as voice-enabled GPS systems, PDAs, and text-to-speech software are often problematic, as large NLP models are often unable to understand or generate words in AAVE. . Worse, the models are often trained on data pulled from the web and are likely to embed the racial biases and stereotypical associations rampant online.

When these biased models are used by companies to help make high-stakes decisions, AAVE speakers may find themselves unfairly excluded from social networks, inappropriately denied access to housing, or to loan opportunities, or being treated unfairly by law enforcement or justice systems.

For the past 18 months, machine learning (ML) specialist Jazmia Henry has been focused on finding a way to responsibly integrate AAVE into language models. As a fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) and the Center for Comparative Studies in Race and Ethnicity (CCSRE), she created an open-source corpus of over 141,000 AAVE words to help researchers and manufacturers to design models that are both inclusive and less likely to be biased.

Event

On-Demand Smart Security Summit

Learn about the essential role of AI and ML in cybersecurity and industry-specific case studies. Watch the on-demand sessions today.

look here

"My hope with this project is that social and computational linguists, anthropologists, computer scientists, social scientists, and other researchers will push and push at this corpus, research with it, wrestle with it and test its limits so that we can turn this into a true representation of AAVE and provide feedback and information on our potential next steps algorithmically,” Henry said.

In this interview, she describes the early hurdles to the development of this database, its potential to help computational linguistics understand the origins of AAVE, and its post-Stanford projects.

How would you describe African American Vernacular English?

For me, AAVE is a language of perseverance and elevation. It is the result of African languages ​​thought to have been lost during the slave trade migration being incorporated with English to create a new language used by the descendants of these African peoples.

How did you become interested in including AAVE in NLP models?

As a child, both of my parents sometimes spoke their mother tongue. For my West Indian dad, it was Jamaican patois, and for my mom it was Gullah Geechee, found in the coastal areas of the Carolinas and Georgia. Each language was a creole, which is a new language created by mixing different languages.

Everyone seemed to understand that my parents spoke a different language, and no one doubted their intelligence. But when I saw people in my community speaking AAVE, which I believe to be another Creole language, I could tell there was a shame and stigma associated with it - a feeling that if we used that language on the outside, we were going to be judged as less intelligent. When I started working in data science, I wondered what would happen if I tried to collect data on AAVE and incorporate it into

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow