Visual Search, Eye Movements and Object Recognition

Arash Fazl , Steve Grossberg, and Ennio Mingolla

Neural data and models have proposed that the brain achieves invariant object recognition by learning and combining several views of a three-dimensional object.

 

 

 

 

 

 

 

 

How are such invariant codes learned when active eye movements scan a scene, given that the cortical magnification introduces a large source of variability in the visual representation even for the same view of the object? How does the brain avoid the problem of erroneously classifying together parts of different objects when an eye movement changes its cortical representation from one to the other? How does the brain differentiate between saccades on the same object and saccades between different objects?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The space variant representation of the scene in the brain degrades information in the periphery and enhances it in the fovea.

 Cortical magnification represents foveal stimuli with much higher resolution than those in the periphery (Eric Schwartz)

Our biologically inspired ARTSCAN model of visual object learning and recognition with active eye movements proposes answers to these questions. The model explains how surface attention interacts with eye movement generating modules and object recognition modules so that the views that correspond to the same object are selectively clustered together. This interaction does not require prior knowledge of the object identity. The modules in the model conform to modules in What and Where streams of the visual system. The What stream learns a spatially-invariant and size-invariant representation of an object, using bottom-up filtering and top down attentional mechanisms. The Where stream computes indices of object location and guides eyes movements. Preprocessing is assumed to happen in the primary visual areas, notably log-polar compression of the periphery, contrast enhancement and parallel processing of boundary and surface properties. The WHAT stream is a variant of ARTMAP classifiers designed by Gail Carpenter and Stephen Grossberg.
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ARTSCAN

 

These movies show transformations from retina to cortex under different image transformations on the retina.

(the movies are in .avi format and can be views by Quicktime, WindowsMediaPlatyer or Realplayer)

====================================================================================================================

In a parallel project with Marc Pomplun, I study patterns of eye movement when the subjects are engaged in the visual search tasks. We use a video-camera based eye tracker with high sampling frequency. More specifically, I have studied the relationship between target recognition and its saccadic localization in the crowding condition. In such circumstances, subjects cannot recognize a target when it is surrounded by similar stimuli, although they can easily do so when the target is alone. In the following figure, stare at the black circle. While you have little problem telling the orientation of the left patch, it is very hard to report that of the middle patch on the right. How accurate will the subject's look at the target patch? We showed that although the subjects can use the location of flankers to look at the patch, they cannot use the form information to guide their saccade.

 

The abstract is presented in SfN 2003:

A. Fazl, M. Pomplun. EYE MOVEMENTS TO CROWDED STIMULI: THE FIRST LANDING POSITION IS LESS ACCURATE. Program No. 386.12. 2003 Abstract Viewer,  Society for Neuroscience, 2003.

Supported in part by the National Science Foundation (NSF SBE-0354378) and ONR (ONR N00014-01-1-0624)

Research Prior to CNS

 

 

CNS Vision Lab Research Page

Arash Personal Webpage