Deep learning provides new fundamental tools, such as contextualised word embeddings and seq2seq models, that let us build new kinds of Natural Language Understanding apps faster, better and cheaper than ever before. The advanced pattern-matching capabilities of deep learning enable a new approach to app development where the system's behaviour is learnt from training data, dramatically reducing the need for manual scripting. This talk describes how we are using this technology in the Oracle Digital Assistant, focusing especially on Conversational AI. The talk ends with a discussion of how research advances in areas such as explainability, few-shot learning, data augmentation and transfer learning can help this technology achieve its full potential.
Mark Johnson is Chief AI Scientist, Oracle Digital Assistant at Oracle Corporation, and a Professor of Language Sciences in the Department of Computing, Macquarie University. He’s also an Editor in Chief for the Transactions of the Association for Computational Linguistics. Mark has worked on a wide range of topics in computational linguistics, but his main area of research is natural language understanding, especially syntactic parsing and semantic analysis, and their applications to text and speech processing.
There is increasing awareness that we stand on the brink of massive knowledge loss as perhaps half of the world’s languages risk not being learnt by the next generation, and of the attendant urgency of recording them in some form. Yet our conceptions for just how much we should record of each language, if are to do justice to the intellectual richness of the oral traditions they represent, remain tragically unambitious. How much of the knowledge of English or Chinese-speaking cultures would be captured in ten hours of text, a typical amount to be recorded in a language documentation project? Compare this to the 60 million words or so we have in corpora of Classical Greek or Sanskrit, equivalent to about 6,000 hours of recordings. Is it inconceivable for modern day speech communities, seeking a deep abiding record of their language, to record and transcribe that much data? After all, ten members of a speech community, each recording three hours per day, could gather this much in a year.
The real challenge, as linguists and language community members have come to realise, is the transcription bottleneck, the fact that writing down a transcription of one hour of recording typically takes from 40 to 100 hours (and in the early phases of work almost always at the upper end). The result of this bottleneck is that even if we record something like the above amount, current language documentation methods of a few people working together over three years cannot transcribe more than around 15 hours of primary material. This does not touch the levels needed to give a rich corpus for one language, nor does it reach the one hundred hours normally cited as a necessary minimum for a deep-learning training corpus.
In this talk we describe the TAP initiative – Transcription Acceleration Project – which is a joint enterprise of language documentation fieldworkers, community language users, computational linguists, software engineers and machine learning researchers, supported by the ARC-funded Centre of Excellence for the Dynamics of Language (CoEDL). This project aims to break this impasse posed by the transcription bottleneck. The goal is not to obtain fully-automated transcription, which is a dangerous mirage, both because it denigrates language community members’ social and cultural roles, and because it consistently fails to pick up the things it does not yet know. TAP supports the use of semi-automated speech recognition technology in language documentation workflows. Semi-automation gives people access to cutting-edge technologies, while maintaining roles for people that are critical for cultural or research purposes, as people-in-the-loop editing the output of an ASR system before using it in other processes. TAP has supported the development of Elpis and Persephone, speech recognition systems for orthographic and phonemic transcription, designed to be used by people without the technical experience typically required to install or use ASR tools.
With these tools, TAP aims to improve the transcription experience, supporting new ways of working to improve the state of language documentation globally. For Australia and its neighbours, we will be able to secure a much greater proportion of the region’s rich but often ignored linguistic cultural heritage – around a quarter of the world’s languages – for the generations to come.
Nicholas Evans is Director of the ARC Centre of Excellence for the Dynamics of Language (CoEDL), a Distinguished Professor of Linguistics at the College of the Asia Pacific, ANU and an ARC Laureate Professor. His contributions to linguistics include documentation of fragile and little-known languages in Australia and New Guinea based on over six years of fieldwork, studying the implications of little-known languages for linguistics and creating a framework for linguistic typology to enable systematic comparison of languages. He has also done applied work as a linguist, anthropologist and interpreter in areas ranging from Native Title, traditional ecological knowledge, promotion of Indigenous art in Australia, and vernacular education in Australia, PNG and elsewhere
Ben Foley is project manager of CoEDL's Transcription Acceleration Project (TAP). TAP brings cutting-edge language technology within reach of people working with some of the worlds oldest languages. Ben's previous experience with Aboriginal and Torres Strait Islander language resource development has resulted in apps and websites galore, including Iltyem-Iltyem and Gambay First Languages Map.