Klein, M. . , Rey, A. , Madec, S. & Grainger, J.
LPC, CNRS, University of Aix-Marseille
Although single letters are frequently used as input to demonstrate the ability of pattern recognition algorithms, the number of neural network models that try to explain human letter recognition data in terms of cognitive processes is still very limited. Furthermore, these models are often hardwired and rely on a set of input features that are defined by the modeler (e.g., Rey et al., 2009). In this study, we model human reaction times using neural networks that extract visual features from real images of the letters. We focus on learning, and how different factors and different learning methods affect the correlation of simulated reaction times with behavioral data (Madec et al., 2012). Specifically, we are interested in studying the effect of 3 factors on this correlation: (i) utilization of an error signal during learning (supervised vs. unsupervised learning), (ii) whether or not the letter labels exert a top-down influence on the extracted features, and (iii) the effect of letter frequencies (New & Grainger, 2011). To do so, we used Restricted Boltzmann Machines (RBMs), Back-propagation networks, and RBM/Perceptron hybrid architectures. We find the highest correlations (r = 0.66) of supervised models when using top-down information of letter labels on the feature layer during training, but only when they are trained using letter frequencies. This study shows that to account for human letter identification times, letter frequency seems to be the most important factor. In addition, top down information of letter labels on the extracted visual features also appear to be essential (and make the difference between a significant and non-significant correlation). Utilization of an error-signal does not make a strong difference in the correlation to human reaction time data, but fully unsupervised models have a harder time generating accurate categorization for letters with very low frequencies.