PhD Project 29

Modelling Psychological and Perceptual Aspects of the Mental Lexicon


PIs: Frank (CLS), Ernestus (CLS), and Fernandez (ILLC)
PhD-candidate: Danny Merkx
Start date: November 01, 2017

(last update 2019-07-01)

Research Content

This PhD project focuses on the psychological and perceptual aspects of modelling the mental lexicon. Its objective is to develop vector representations of phonological and orthographic word forms and to connect these to the morpho-syntactic and semantic vector representations developed in other parts of the BQ1 and in research on distributional semantics more generally. This may yield answers to several relevant psycholinguistic questions: Which word forms (e.g. morphological, spelling, and pronunciation variants) are stored and how are they represented? How do bilinguals represent words for one concept in different languages? Is analogical processing a viable alternative to abstract rule application for explaining how non-stored forms are understood? What are the unique constraints posed by the properties of different input modalities (e.g. auditory versus visual presentation, static written versus dynamic spoken forms)? How does non-arbitrariness in the form-meaning mapping affect language processing and lexical representation?

Progress 2018

A neural network model that maps between images and written captions describing these images was successfully implemented. Its performance is comparable with the state-of-the-art, even though competing models rely on large amounts of independently developed lexical semantic knowledge. The model, in contrast, does not even have any (explicit) representations of words. It was further shown that similarities between the sentence representations that arise in the model correlate with human judgements of semantic similarity between these sentences. A journal paper about these results is under review for Natural Language Engineering. Furthermore, a version of the model that uses spoken captions as input was developed.

Groundbreaking characteristics

This project forms a collaboration between two research groups at the CLS and involves insights and techniques from, among others, computational linguistics, machine learning, semantics, speech comprehension research, and psycholinguistics. In its current stage, the projects’ main innovations are: (1) the implementation of a neural network model that maps between written language and visual images without including any lexical representations; and (2) the evaluation against human semantic-similarity ratings of the internal representations developed in this neural network.