Visual Search, Eye Movements and Object Recognition
Arash Fazl , Steve Grossberg, and Ennio Mingolla
Neural data and models have proposed that the brain achieves invariant object recognition by learning and combining several views of a three-dimensional object.
How are such invariant codes learned when active eye movements scan a scene, given that the cortical magnification introduces a large source of variability in the visual representation even for the same view of the object? How does the brain avoid the problem of erroneously classifying together parts of different objects when an eye movement changes its cortical representation from one to the other? How does the brain differentiate between saccades on the same object and saccades between different objects?
The space variant representation of the scene in the brain degrades information in the periphery and enhances it in the fovea.
Cortical magnification represents foveal stimuli with much higher resolution than those in the periphery (Eric Schwartz)
biologically inspired ARTSCAN model of visual object learning and recognition
with active eye movements proposes answers to these questions. The model
explains how surface attention interacts with eye movement generating modules
and object recognition modules so that the views that correspond to the same
object are selectively clustered together. This interaction does not require
prior knowledge of the object identity. The modules in the model conform to
modules in What and Where streams of the visual system. The What stream learns a
spatially-invariant and size-invariant representation of an object, using
bottom-up filtering and top down attentional mechanisms. The Where stream
computes indices of object location and guides eyes movements. Preprocessing is
assumed to happen in the primary visual areas, notably log-polar compression of
the periphery, contrast enhancement and parallel processing of boundary and
surface properties. The WHAT stream is a variant of ARTMAP classifiers designed by
Gail Carpenter and
These movies show transformations from retina to cortex under different image transformations on the retina.
(the movies are in .avi format and can be views by Quicktime, WindowsMediaPlatyer or Realplayer)
In a parallel project with Marc Pomplun, I study patterns of eye movement when the subjects are engaged in the visual search tasks. We use a video-camera based eye tracker with high sampling frequency. More specifically, I have studied the relationship between target recognition and its saccadic localization in the crowding condition. In such circumstances, subjects cannot recognize a target when it is surrounded by similar stimuli, although they can easily do so when the target is alone. In the following figure, stare at the black circle. While you have little problem telling the orientation of the left patch, it is very hard to report that of the middle patch on the right. How accurate will the subject's look at the target patch? We showed that although the subjects can use the location of flankers to look at the patch, they cannot use the form information to guide their saccade.
The abstract is presented in SfN 2003:
A. Fazl, M. Pomplun. EYE MOVEMENTS TO CROWDED STIMULI: THE FIRST LANDING POSITION IS LESS ACCURATE. Program No. 386.12. 2003 Abstract Viewer, Society for Neuroscience, 2003.
Supported in part by the National Science Foundation (NSF SBE-0354378) and ONR (ONR N00014-01-1-0624)
Research Prior to CNS
CNS Vision Lab Research Page
Arash Personal Webpage