L2HM 2018: Acquisition humaine et artificielle du langage

Entre modèles et réalités

Jeudi 5 et vendredi 6 juillet

Session : Scientifique
Langue : Anglais
Niveau : Académique

simple site builder

L2HM 2018 : Acquisition humaine et artificielle du langage

Entre modèles et réalités

Titre anglais : "Language learning in humans and in machines" (L2HM)
Langue des discussions :
Anglais
Public visé : Académiques (étudiants, chercheurs, enseignants)

En dépit des progrès considérables de l'intelligence artificielle au cours des dernières années, la maîtrise des subtilités du langage naturel constitue encore un défi majeur pour ces technologies. En revanche, leur développement a su faciliter la construction de nouveaux modèles cognitifs de l'apprentissage du langage chez l'humain. Il convient, dès lors, de considérer la possibilité de s'appuyer sur ces dernières avancées afin de dépasser les limites actuelles de la machine. Aussi, cette conférence scientifique entreprendra d'élucider les intersections fondamentales entre acquisition naturelle et artificielle du langage.

A ce titre, la psycholinguistique offre un cadre théorique des plus pertinents puisqu'elle identifie des mécanismes cognitifs pouvant être transposés à l'apprentissage computationnel, pour imiter les capacités humaines. De plus, à travers ses protocoles expérimentaux, elle fournit des critères pour évaluer les performances des algorithmes en les comparant directement aux résultats observés chez l'homme.

Réciproquement, les approches computationnelles perfectionnent des concepts et des outils applicables aux différentes théories de l'apprentissage naturel du langage, autant adaptés à leur analyse qu'à leur évaluation quantitative.

Lors de cette rencontre, des chercheurs de renommée internationale partageront leurs découvertes les plus récentes afin de baser les discussions. En vue de favoriser l'interdisciplinarité des échanges, chacune des trois sessions de la conférence confrontera un psycholinguiste à un linguiste computationnel. L'examen abordera tout d'abord l'apprentissage du langage "à l'état sauvage", c'est-à-dire sous sa forme la plus brute. Il se resserrera en second lieu sur l'acquisition de la syntaxe. La dernière session se focalisera sur une unité plus restreinte : les mots. Enfin, la conférence se clôturera par une discussion générale, incluant chacun des conférenciers, combinant les points de vue sur le rapport entre apprentissage naturel et artificiel du langage.

Conférenciers

Susan Goldin-Meadow 
University of Chicago

 Afra Alishahi 
Tilburg University

Phil Blunsom
University of Oxford / DeepMind

Cynthia Fisher 
University of Illinois Urbana-Champaign

Chen Yu 
Indiana University Bloomington

 Michael Frank 
Stanford University

Discussions animées par...

Alejandrina Cristia
CNRS

Anne Christophe
CNRS

Emmanuel Dupoux
EHESS / DeepMind

Programme

Les 5 et 6 juillet en salle Jaurès à l'ENS

Jeudi 5 juillet - matinée

Thème : L'apprentissage du langage à l'état sauvage 

08h30 - 08h45 Accueil
08h45 - 09h45 Conférence inaugurale Susan Goldin Meadow (U. Chicago)
09h45 - 11h00 Posters
11h00 - 12h00 Conférence Afra Alishahi (Tilburg University)
12h00 - 12h30 Panel Alejandrina Cristia (CNRS)

Jeudi 5 juillet - Après-midi

Thème : Apprentissage de la syntaxe

13h30 - 14h30 Conférence Phil Blunsom (DeepMind, Oxford)
14h30 - 16h00 Discussions fermées au public
16h00 - 16h30 Pause café
16h30 - 17h30 Conférence Cynthia Fisher (U. Illinois)
17h30 - 18h00 Panel Anne Christophe (CNRS)

Vendredi 6 juillet - Matinée

Thème : Apprentissage des mots 

08h30 - 09h30 Conférence Chen Yu (U. Indiana)
09h30 - 10h30 Posters
10h30 - 11h30 Conférence Michael Frank (Stanford)
11h30 - 12h30 Table ronde réunissant les six conférenciers modérée par Emmanuel Dupoux (EHESS/DeepMind) et Alejandrina Cristia (CNRS).

Détail des interventions

Thème : L'apprentissage du langage à l'état sauvage

Jeudi 5 - matinée

8h45 - 9h45 : Susan Goldin-Meadow

What small data can tell us about the resilience of language

Children learn the languages to which they are exposed. Understanding how they accomplish this feat requires that we not only know as much as we can about the linguistic input they receive, but also about the architecture of the minds that process this input. But because linguistic input has such a massive, and immediate, effect on the language children acquire, it is difficult to determine whether children come to language learning with biases and, if so, what those biases are — and getting more and more data about linguistic input won’t solve the problem. Examining children who are not able to make use of the linguistic input that surrounds them does, however, allow us to discover the biases that children bring to language learning. The properties of language that such children develop are not only central to human language, but also provide hints about the kind of mind that is driven to structure communication in this way. In this talk, I describe congenitally deaf individuals who cannot learn the spoken language that surrounds them and have not been exposed to sign language by their hearing families. Individuals in these circumstances use their hands to communicate––they gesture––and those gestures, called homesigns, take on many, but not all, of the forms and functions of languages that have been handed down from generation to generation. I first describe properties of language that are found in homesign. I next consider properties not found in homesign and I explore conditions that could lead to their development, first, in a naturally occurring situation of language emergence (Nicaraguan Sign Language) and, then, in an experimentally induced situation of language emergence (silent gesturers asked to communicate using only their hands). My goal is to shed light on the type of cognitive system that can not only pick up regularities in the input but, if necessary, also create a communication system that shares many properties with established languages.


9h45 - 11h00 : Posters

Ces posters scientifiques seront affichés dans les couloirs et leurs auteurs seront disponibles pour en discuter et les expliquer

(*) Les posters précédés de ce symbole seront également présentés brièvement à l'oral en début de séance

Modélisation des processus d’acquisition syntaxique par jeux de langage entre agents artificiels. Marie Garcia, Isabelle Tellie. 

A comparison of two types of automatic speech recognition systems as quantitative models of cross-linguistic phonetic perception. Thomas Schatz, Naomi Feldman.

Gender Processing in Second Language Acquisition. Manex Agirrezabal, Alice Ping Ping Tse.

Dynamics of single word production from childhood to adolescence and adulthood. Tanja Atanasova, Raphaël Fargie, Pascal Zesiger, Marina Laganaro. 

Computational Word Segmentation and Code-Switching: the Chintang case. Georgia - Rengina Loukatou, Sabine Stoll, Alejandrina Cristia.

Emergence of attention in a neural model of visually grounded speech. William Havard, Jean-Pierre Chevrot, Laurent Besacier.

Using Correlated Contextual Cues to Shift Attentional Biases in Word Learning. Michelle Luna, Catherine Sandhofer.

(*) Learning and evaluating hierarchies of verb argument structure, Jesse Mu. Joshua Hartshorne, Timothy O’donnel.

(*) The role of prediction error in linguistic generalization and item-based learning. Masa Vujovic, Michael Ramscar, Elizabeth Wonnacott
Modeling language learning from the ground up, Christos Christodoulopoulos, Dan Roth, Cynthia Fisher.

(*) Semi-supervised learning in human infant. Sandy Latourrette, Sandra Waxman.

Verbalization as a tool for compositional problem solving in humans and machines. Rahma Chaabouni, Armand Joulin, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni.

The representation of syntactic structures in Long-Short Term Memory networks and humans. Yair Lakretz, German Kruszewski, Dieuwke Hupkes, Theo Desbordes, Sébastien Marti, Stanislas Dehaene, Marco Baroni.

(*) The Perils of Natural Behavioral Tests for Unnatural Models: The Case of Number Agreement, Adhiguna Kuncoro, Chris Dyer, John Hale, Phil Blunsom.
 
(*) Do RNN language models induce referential information? An analysis of article prediction, Kristina Gulordava, Gemma Boleda.

Investigating the relationship between language processing efficiency, vocabulary, and working memory across development, Michelle Peter, Amy Bidgood, Samantha Durrant, Julian Pine, Caroline Rowland.


11h00 - 12h00 : Afra Alishahi

Emerging representations of form and meaning in models of grounded language

Humans learn to understand speech from weak and noisy supervision: they manage to extract structure and meaning from speech by simply being exposed to utterances situated and grounded in their daily sensory experience. Emulating this remarkable skill has been the goal of numerous studies; however researchers have often used severely simplified settings where either the language input or the extralinguistic sensory input, or both, are small scale and symbolically represented.

We simulate this process in visually grounded models of language understanding which projects utterances and images to a joint semantic space. We use variations of recurrent neural networks to model the temporal nature of spoken language, and examine how form and meaning-based linguistic knowledge emerge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease.



Thème : L'apprentissage de la syntaxe

Jeudi 5 - après-midi


13h30 - 14h30 : Phil Blunsom

Structure and grounding in language

Computational models of language built upon recent advances in Artificial Intelligence are able to produce remarkably accurate predictive distributions when trained on large text corpora. However, there is significant evidence that such models are not discovering and using the latent syntactic and semantic structure inherent in language. In the first part of this talk I will discuss recent recent work at DeepMind and Oxford University aimed at understanding to what extent current deep learning models are learning structure, and whether models equipped with a preference for memorisation or hierarchical composition are better able to discover lexical and syntactic units. In the second part of this talk I will describe initial work at DeepMind to train agents in simulated 3D worlds to ground simple linguistic expressions.


15h30 - 16h30 : Sessions parallèles - fermées au public


16h30 - 17h30 : Cynthia Fisher

Words, syntax, and conversation: How children use sentence and discourse structure to learn about words

Children learn their native languages from notably noisy input. How do they manage this feat? One class of proposals suggests that children do so by learning in a biased system that expects (and therefore detects or builds) straightforward links between different levels of linguistic structure. Syntactic bootstrapping is an example of such a proposal, arguing that word learning (verb learning in particular) is guided by the child's growing syntactic knowledge. The structure-mapping account proposes that syntactic bootstrapping begins with universal biases (1) to map nouns in sentences onto distinct participant roles in a structured conceptual representation and (2) to represent syntactic knowledge in abstract terms. These biases make some aspects of sentence structure inherently meaningful to children (e.g., the number of nouns in the sentence), and permit children to generalize newly acquired syntactic knowledge rapidly to new verbs. In this talk, I will first review evidence for the structure-mapping account. Next, I will discuss challenges to the account arising from the existence of languages that allow verbs’ arguments to be omitted, such as Korean. I will propose and review evidence that an expectation of discourse continuity allows children to gather linguistic evidence for verbs’ arguments across sentences in a conversation. Taken together, these lines of evidence make clear that simple aspects of sentence structure can guide verb learning from the start of multi-word sentence comprehension, and even when some of the new verb’s arguments are missing due to discourse redundancy.



Thème : L'apprentissage des mots

Vendredi 6 - matinée


8h30 - 9h30 : Chen Yu

Statistical Word Learning: Data, Mechanisms and Models

Recent theory and experiments offer a solution as to how human learners may break into word learning, by using cross-situational statistics to find the underlying word-referent mappings. Computational models demonstrate the in-principle plausibility of this statistical learning solution and experimental evidence shows that both adults and infants can aggregate and make statistically appropriate decisions from word-referent co-occurrence data. In this talk, I will first review these empirical and modeling contributions to investigate cognitive processes in statistical word learning. Next, I will present a set of studies using head-mounted cameras and eye trackers to collect and analyze toddlers’ visual input as parents label novel objects during an object-play session. The results show how toddlers and parents coordinate momentary visual attention when exploring novel objects in free-flowing interaction, and how toddlers accumulate co-occurring statistics of seen objects and heard words through free play. I will conclude by suggesting that future research should focus on detailing the statistics in the learning environment and the cognitive processes that make use of those statistics.


9h30 - 10h30 : Posters

Ces posters scientifiques seront affichés dans les couloirs et leurs auteurs seront disponibles pour en discuter et les expliquer

(*) Les posters précédés de ce symbole seront également présentés brièvement à l'oral en début de séance

A Computer-Simulated Laboratory-Approach towards Language Acquisition based on Interaction between two Language Producing Automata. David P. Shakouri, Crit L. Cremers, Niels O. Schiller.

Using cognitive word games to promote lexical memory access and vocabulary retrieval in second language learners. Majed Alqahtani. 

The role of variability in linguistic generalization: evidence from a computerized language training game with 7-year-olds. Elizabeth Wonnacott, Masa Vujovic, Chantal Miller.

Word learning and the acquisition of a syntactic–semantic overhypothesis. Jon Gauthier, Roger Levy, Joshua Tenenbaum.

Modeling word segmentation and vocabulary acquisition across languages. Gladys Baudet, Elin Larsen, Emmanuel Dupoux, Alejandrina Cristia.

(*) A Framework for Lexicalized Grammar Induction Using Variational Bayesian Inference. Chris Bruno, Eva Portelance, Daniel Harasim, Leon Bergen, Timothy O’donnell. 

Are word-referent mapping and speech segmentation performed jointly by infant learners ? Elin Larsen, Emmanuel Dupoux, Alejandrina Cristia.

(*) The role of iconic multi-modal cues in early language learning. Gabriella Vigliocco, Yasamin Motamedi, Margherita Murgiano , Pamela Perniss. 

(*) Statistical learning in infant language acquisition; from segmenting speech to discovering structure. Rebecca Frost, Caroline Rowland, Samantha Durrant, Michelle Peter, Amy Bidgood, Padraic Monaghan.

What makes a question easy or hard? Answers to early questions in the context of shared picture book reading. Sara Moradlou, Jonathan Ginzburg. 

Semantic seed bootstraps early verb categorization. Mireille Babineau, Anne Christophe, Rushen Shi.

Recognition of Moore-paradoxical sentences in children. Szabolcs Kiss.

(*) How the input shapes the acquisition of verb and noun morphology: neural network modelling across three highly inflected languages. Felix Engelmann, Joanna Kolak, Sonia Granlund, Virve Vihman, Ben Ambridge, Julian Pine, Anna Theakston, Elena Lieven.

Interpretable Machine Learning for Predicting Brain Activation in Language Processing. Willem Zuidema, Dieuwke Hupkes, Samira Abnar.

The Fast and the Flexible: pretraining neural networks to learn from small data. Rezka A. Leonandya, German Kruszewski, Dieuwke Hupkes, Elia Bruni. 

Investigating hierarchical bias in the acquisition of English question formation with recurrent neural networks. Tom McCoy, Robert Frank, Tal Linzen.


10h30 - 11h30 : Michael Frank

Variability and Consistency in Early Language Learning: The Wordbank Project

Every typically developing child learns to talk, but children vary tremendously in how and when they do so. What predicts this variability? And which aspects of early language learning are consistent across the world’s languages and cultures? We use data from tens of thousands of children learning dozens of different languages to create a data-driven picture of universals and variation in early language learning.

Inscription

Les inscriptions sont closes.

Venir

Salle Jaurès,
Département d'Etudes Cognitives,
29 rue d'Ulm, 75005 Paris

Partenaires