TROLLHETTA AS
contributes with experience
Image Understanding (IU) is the automation of visual tasks by computers.
A visual task is some activity which relies on vision. Usually the "input" to this activity is a scene or image source, and usually the "output" is some decision, description, action, or report.
There are several reasons that computers are more suitable than humans for visual tasks:
The technical problem is that of automatically deriving a sensible description from an image. We call the application within which the description makes sense, a "domain" of interest. Typically, in a domain there are named objects and characteristics that can be used in a report or to make a decision. Obviously, there is a wide gap between the nature of images (essentially arrays of numbers) and descriptions. It is the bridging of this gap that has kept researchers very busy over the last two decades in the fields of Artificial Intelligence, Scene Analysis, Image Analysis, Image Processing, and Computer Vision. Today we summarize these fields as Image Understanding research.
In order to make the link between image data and domain descriptions, an intermediate level of description is introduced. It generally contains geometric information. Processing usually starts with some image processing, where noise and distortion are reduced and certain important aspects of the imagery are emphasized. Then, features are extracted from the image(s) that characterize the information needed for description. Typically, these features are blobs, edges, lines, corners, regions, etc. They are stored at the intermediate level of abstraction. Such descriptions are free of domain information - they are not specifically objects or entities of the domain of understanding, but they contain spatial and other information. It is the spatial/geometric (and other) information that can be analyzed in terms of the domain in order to interpret the images.
Various techniques are used to interpret. One example is "model matching" where stored geometric descriptions of objects of the domain are matched with extracted features from the images. Techniques are called "bottom-up" when the primary direction of flow of processing is from lower abstraction levels (images) to higher levels (objects), and conversely "top-down" when the processing is guided by expectations from the domain.