Amazon unveils its long-term goal in natural language processing
The goal: a single machine learning model that can analyze and understand input from multiple languages. The use case: people who interact with Alexa in their native language (among other commercial applications).
On April 20, 2022, Amazon announced three developments to achieve this goal, which it called MMNLU-22, the initials standing for Massively Multilingual Natural Language Understanding or Massively Multilingual NLU.
The three developments are the release of a dataset with one million tagged utterances in 51 languages and open source code; a competition using this dataset (deadline: June 1, 2099); and a workshop at the world’s largest machine translation conference (EMNLP 2022 in Abu Dhabi, December 7-11, 2022).
Amazon called the data set MASSIVE; i.e. multilingual Amazon SLURP for slot population, intent classification, and virtual assistant rating. The dataset comes with examples of how to perform MMNLU modeling so that others can recreate the basic results for two critical NLU tasks – intent classification and slot filling – as described in the SLURP document (or set of SLU resources) linked above.
NLU is a sub-discipline of Natural Language Processing (NLP) and Amazon said it focuses on NLU as a component of Spoken Language Understanding (SLU), where audio is converted to text before NLU is executed. Alexa is an example of an SLU-based virtual assistant.
The MASSIVE dataset includes “one million realistic, parallel, and labeled virtual assistant text utterances spanning 51 languages, 18 domains, 60 intents, and 55 locations.”
Amazon created the dataset “by commissioning professional translators to localize or translate the English-only SLURP dataset into 50 typologically diverse languages across 29 genres, including low-resource languages.”
Amazon is essentially trying to overcome a major hurdle of SLU-based virtual assistants like Alexa; NLU academic and industrial R&D is still limited to a few languages.
“One of the difficulties in creating massively multilingual NLU models is the lack of labeled data for training and assessment, especially data that is realistic for a given task and natural for a given language. A high natural usually requires human control, which is often expensive. »
Therefore, R&D is “limited to a small subset of the world’s more than 7,000 languages,” Amazon pointed out. “By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is sparse.”
Hinting at where it hopes to apply these latest developments commercially, Amazon noted that of the more than 100 million smart speakers sold worldwide (e.g. Echo), most use a voice interface exclusively and press NLU to operate. The company has estimated that the number of virtual assistants will reach eight billion by 2023, and most will be on smartphones.