Tenure Track 1

(last update 2020-05-12)

Computational psycholinguistics of sentence processing

Senior researcher: Stefan Frank
PhD: Chara Tsoukala

Research description

As details of the cognitive processes and representations underlying language use continue to be uncovered, and ever increasing amounts of behavioural and (neuro)physiological data are collected, it becomes more and more difficult to capture the immense complexity of human language processing in theories that are specified only verbally. In contrast to verbal description, implemented computational models that simulate aspects of processing are able to generate fine-grained, quantitative predictions and can thereby expose how, exactly, observed properties of language comprehension and production may emerge.

The general aim of this tenure track is the development and application of computational models of human sentence processing, bridging between linguistic and cognitive theory, psychological experimentation, and neuroimaging data; particularly in the context of multilingualism. The basic assumption behind this work is that the mind is for a large part a statistical system: It extracts (linguistic) patterns from observations and applies abstractions over these patterns when processing novel input. Any model of such a system embodies particular assumptions about the relevant processes and representations. The cognitively most plausible assumptions can then be identified by comparing how well different models’ predictions fit human processing data. Thus, statistical models of language are developed, implemented, and trained on linguistic data; and their quantitative predictions of behavioural and/or neural responses serve to evaluate to the models’ value as cognitive theories. In addition, human language comprehension experiments are run to test specific (model) predictions of language use as rooted in the application of language statistics.

The development of computationally explicit models contributes to the overarching quest of LiI because it is instrumental in bridging between functional, algorithmic, and implementational (neural) levels of explanation; and thereby coming to a comprehensive understanding of observed language phenomena. More specifically, implemented statistical models of language processing form testable theories of how properties of the cognitive system interact with properties of the language, which speaks to the question of boundary conditions of language and language use. In addition, as the majority of the world’s population is multilingual, accounting for the full variability in human language use requires moving beyond the single-language case, as will be done by developing multilingual models and by running experiments that investigate cross-linguistic differences in mono- and bilingual contexts.


Highlight 1: The costs of linguistic prediction may not outweigh the gains: an analysis using information-theoretic measures of word prediction

Team members: Stefan Frank and Christoph Aurnhammer (Saarland University, Germany)

Surprisal is a well-known information-theoretic metric of the extent to which a word occurs unexpectedly. Its validity as measure of cognitive processing load during sentence comprehension is well-established: both word-reading times and neural activity increase (linearly) with surprisal. Although this is often taken as evidence for linguistic prediction, surprisal is fundamentally backward looking. We develop and evaluate a novel metric called Lookahead Information Gain (LIG) that (like surprisal) is based in information theory and can be derived from fundamental assumptions about the cognitive processes involved in sentence comprehension, but (unlike surprisal) is truly forward looking.

We evaluated to what extent LIG explains reading time and N400 size measured during naturalistic sentence reading tasks in English, over and above surprisal. Contrary to our expectations, there were no consistent effects of  LIG whatsoever, casting doubt about the hypothesis that upcoming words are pre-activated according to their probability of occurrence. Moreover, we found that our LIG values increase with the quality of the probabilistic language model that estimates them (unlike surprisal values, which decrease; see Figure 1). Informally, this implies that having more accurate knowledge of the language makes prediction harder rather than easier. The means that, under a strict probabilistic prediction account of language processing, having more accurate knowledge results in more accurate predictions, but predicting more accurately requires more effort. Hence, the costs of linguistic prediction may not outweigh the gains.

Figure 1. Average surprisal and LIG measures as a function of the number of training sentences presented to the recurrent neural network model that estimates the measures. LIG1 and LIG2 refer to two versions the LIG that differ in the assumed prior probability distribution over words (image taken from Aurnhammer & Frank, 2019).

This work is interdisciplinary because it integrates computational modelling with behavioural and neuroimaging data. Its main innovativeness lies in the novel information-theoretic measure for cognitive processing load during language comprehension. The development of computationally explicit models contributes to the overarching quest of LiI because it is instrumental in bridging between functional, algorithmic, and implementational (neural) levels of explanation; and thereby coming to a comprehensive understanding of observed language phenomena. More specifically, implemented statistical models of language processing form testable theories of how properties of the cognitive system interact with properties of the language, which speaks to the question of boundary conditions of language and language use.

Progress Update 2019

In addition to the tenure track PhD project and projects in the context of BQ1, other projects are initiated or collaborated upon in the tenure track.

Projects and collaborations

The following collaborative projects completed in 2019:

1)     Cross-linguistics differences in processing syntactically complex sentences, with Rein Cozijn (Tilburg University) and Robin Thompson (University of Birmingham). Sentences with a double-embedded structure are harder to understand in English than in Dutch or German, as evidenced by the so-called ‘grammaticality illusion’ arising in English but not in Dutch or German (Frank et al., 2016); a finding we replicated in an off-line rating task (Frank & Ernst, 2019). The cross-linguistic difference has been explained in terms of differences between the languages’ statistical word-order patterns. This eye-tracking project aimed to test this hypothesis by investigating if increased exposure to L2 English strengthens the illusion.

2)     Gated recurrent neural networks as cognitive models of sentence processing, with Christoph Aurnhammer (Saarland University). In recent years, performance on natural language processing tasks have greatly improved because of the introduction of recurrent neural networks with gated recurrent units. However, this does not imply that such networks are more accurate as models of human cognitive processing. This project investigated the cognitive validity of gated recurrent network models by comparing predictions from different network architectures to human reading-time and EEG data.

3)     Modelling bilingual and non-native sentence comprehension, with Robin Thompson (University of Birmingham). In this project, we investigate whether sensitivity to English language statistics differs between monolinguals, bilingual native English speakers, bilingual native Dutch speakers, and bilingual native British Sign Language signers. A participant’s sensitivity to language statistics is operationalized as the extent to which their word-reading times can be predicted by word probability estimates from a statistical language model.

4)     Recurrent neural networks simulations of garden-path phenomena, with John Hoeks (University of Groningen). Garden-path effects are traditionally explained in terms of syntactic structure building: when a local syntactic ambiguity and readers select the incorrect reading, they need to reanalyse the sentence at the disambiguating point, resulting in reading slowdown. We investigate whether a recurrent neural network, which does not explicitly build any structure, is nevertheless able to predict the reading time effects from the eye-tracking study of Hoeks et al. (2006).

5)     Modelling eye-movements during reading with dependency parsing. Spin-off from thePhD project of Alessandro Lopopolo (CLS), co-supervised with Antal van den Bosch (CLS) and Roel Willems (CLS, Donders, MPI). This project investigates whether regressive saccades during natural narrative reading can be predicted by the output of a dependency parser. That is, is there an increased probability of making a regressive saccade from word x to word y if there is a dependency relation between x and y?

6)     Using entropy measures to investigate prediction during reading, with Christoph Aurnhammer (Saarland University). Word surprisal and next-word entropy are information-theoretic measures that have been used to investigate predictive processes during language comprehension. However, results based on these measure are controversial: effects of a word’s surprisal arise upon processing of the word so they may not result from prediction, and the theoretical underpinning of next-word surprisal appears to be flawed. In this project, we investigate alternative entropy-based measures that are based on more solid theoretical foundations.

The following on-going collaborative projects continued in 2019:

7)     Effects of semantic relatedness between words: ERPs versus reading times. Frank & Willems (2017) showed that a word’s semantic relatedness (as quantified by a distributional semantics model) to earlier words in the sentence predicts the size of the N400 ERP component during reading, over and above well-known effects of word predictability (as quantified by a probabilistic language model). Later work surprisingly showed that the same does not hold for reading times (Frank, 2017). The goal of this project is to explain why the two dependent measures diverge for semantic relatedness but not for word predictability. This is accomplished by comparing model predictions to eye-tracking and EEG data that is simultaneously recorded while participants read sentences samples from narratives.

8)     Chunking in language processing: experimental and modelling approaches, PhD project of Jinbiao Yang (IMPRS Fellow at CLS), co-supervised with Antal van den Bosch (CLS). There is ample evidence that the mental lexicon stores not only words but also larger chunks of language. Not much is known, however, about what the chunks are and how they are used during comprehension. This project develops a computational model of language chunking during reading and validate it against data collected in eye-tracking and EEG experiments.

In addition, two new projects were initiated in 2019:

9)     Multimodal integration and bilingualism, with Robin Thompson (University of Birmingham). Recent studies have suggested monolinguals and bilinguals differ in low-level perceptual processing (Bidelman & Heath, 2019; Marian et al., 2018). In this project, we investigate whether bilinguals are more sensitive to the temporal co-occurrence of multimodal input compared to monolingual. Participants perform a low-level perception task that requires them to indicate which of two simple stimuli (one auditory, one visual) was presented first.

10)   Language models and child-directed speech, with Gabriella Vigliocco and Beata Grzyb (University College London). Less predictable words are pronounced slower. Is this because they are harder for the speaker to retrieve or because the speaker tries to make it easier for the listener? To answer this question, we analyse the relation between word predictability and speech rate in child-directed speech (CDS). Predictability is estimated by computational language models that are trained on either adult-directed or child-directed speech. If speech rate is adapted to serve the listener, the model trained on CDS should predict speech rate best. Conversely, if speech rate is modulated because of speaker factors, the model trained on adult-directed speech should predict better.

Synergy with Big Questions

Almost all tenure track projects directly or indirectly bear upon the topics of Big Question 1 because they study the cognitive/neural validity of word vector representations, investigate the mental lexicon, and/or include language-model evaluation against human processing.

One project focuses on identifying the represented units in the mental lexicon and how they interact during word recognition, which is particularly relevant to research in BQ1. The project offers an in interesting perspective by studying Chinese, a language in which the concept of “word” is less clearly defined compared to languages with an orthography that explicitly marks word boundaries (i.e., in which words can be orthographically defined). Hence, one foreseeable impact on BQ1 of the ideas developed in a project is that they help prevent BQ1’s models of the mental lexicon from remaining limited to languages with a particular orthography.

The Tenure Track PhD project, which started two years before BQ1, is not related content-wise but does share some of the methodology (neural network modelling and model evaluation against human data).

Dr. Stefan Frank is also a co-investigator in BQ5.

Innovativeness and Interdisciplinarity

The general aim of the senior researcher’s scientific work is to integrate computational cognitive modelling and human experiments in order to explain properties of sentence comprehension processes, from the level of words up to the representation of propositional content. More in particular, statistical models of language are developed, implemented, and trained on large text corpora, to obtain quantitative predictions of behavioural and neural responses on items in human experiments. In short, the objective is to explain linguistic skills by explicitly modelling how they are rooted in statistical learning and processing.