Smith, A. 1 , Monaghan, P. 2 & Huettig, F. 1, 3
1 Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
2 Department of Psychology, Lancaster University, Lancaster, United Kingdom
3 Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, The Netherlands
Current cognitive models of spoken word recognition and comprehension are underspecified with respect to when and how multimodal information interacts. We compare two computational models both of which permit the integration of concurrent information within linguistic and non-linguistic processing streams, however their architectures differ critically in the level at which multimodal information interacts. We compare the predictions of the Multimodal Integration Model (MIM) of language processing (Smith, Monaghan & Huettig, 2014), which implements full interactivity between modalities, to a model in which interaction between modalities is restricted to lexical representations which we represent by an extended multimodal version of the TRACE model of spoken word recognition (McClelland & Elman, 1986). Our results demonstrate that previous visual world data sets involving phonological onset similarity are compatible with both models, whereas our novel experimental data on rhyme similarity is able to distinguish between competing architectures. The fully interactive MIM system correctly predicts a greater influence of visual and semantic information relative to phonological rhyme information on gaze behaviour, while by contrast a system that restricts multimodal interaction to the lexical level overestimates the influence of phonological rhyme, thereby providing an upper limit for when information interacts in multimodal tasks.