In order to clarify the
terminology, we include here a brief overview of the key graphics concepts. For
example, the terms 3D graphics, visualization, virtual reality, and 3D browsers
for the internet are not interchangeable, since they refer to different
technologies used in different application markets.
Computerized 3D models (of
mechanical assemblies, architectural concepts, animated virtual worlds, human
organs, geological sites, or moving fluids) may be created using computer aided
design and animation software, reconstructed from data acquired through sensing
devices, or computed with simulation or analysis packages.
These models are stored by
applications in proprietary data-structures, which typically capture vertex
coordinates and face topologies or scalar values at specific locations in
space. In most industrial and scientific applications the constantly increasing
size of these data-structures challenges the most advanced workstations. (The
model of the F15 airplane contains over 32 million faces and still does not
contain the finest details.) Standards for exchanging such models are slowly
emerging in different application areas.
Various images of these 3D
models must be displayed to support the design and verification activities, to
enable computational steering and data exploration, to communicate with others,
or to entertain them.
The way a 3D model is
rendered by an application is controlled by: the particular visualization
techniques employed, by the state of an animation or simulation, and by the
user's interaction.
Visualization is the process of automatically interpreting the
model's data to derive visual representations that help the user understand
specific characteristics of the model (such as highlighting a tumor with a
different color). The techniques for performing the interpretation depend on
the domain and may involve considerable amounts of computation. The parameters
may often be dynamically adjusted by the user.
Animation is the process of automatically adjusting the
positions and shapes of the various elements in the model, so as, for example,
to simulate a natural behavior or to demonstrate how a computed model evolves
over time. Animation often requires considerable computational resources.
Interaction is the process of manually controlling the view,
i.e.: the position and orientation of the model with respect to the view point.
Control mechanisms in general involve multi-dimensional input devices (mouse,
head tracker) and are most effective when the effect of a gesture is immediate
(real-time feedback) and can be anticipated (a natural, easy-to-use metaphor).
Given a particular
interpretation and a particular view (as dictated by the visualization,
animation and interaction states), the model must be rendered to produce the
desired image.
The rendering process starts
with the geometric descriptions captured in the model (for example the vertices
and faces of a car engine compartment) or generated by the visualization
programs (the lines showing the air flow in a wind tunnel). These descriptions
are traversed by the application software, which issues low level commands to a
standard graphics API (such as OpenGL or Renderman). Typical commands may
specify the color or the relative position of a component of the model or the
location of the vertices of one of its faces.
Popular rendering techniques
include shading, radiosity, raytracing, and volumetric visualization.
Shading exploits dedicated hardware and optimized software
techniques to produce shaded images of 3D objects or scenes at interactive
rates. These techniques offer limited support for shadows, transparency, or
precise light propagation, and hence yield unrealistic images.
Radiosity techniques do not produce an image, but rather
compute, through a numerically intensive software process, the energy radiated
from all surfaces under the specified lighting conditions. Pre-computed
radiosity information is view-independent and may be combined with the graphics
data for interactive shading. Radiosity is extremely valuable for simulating
static lighting conditions in offices, shopping malls, theaters, and homes.
Raytracing is a numerically intensive process developed for photorealistic
rendering. Quality images of complex scenes may requires minutes of
computations on a single workstation. Computations performed for one view may
only be partially exploited for another near-by view, but the process is well
suited for parallelization and hardware assist.
Radiosity and raytracing are
based on global illumination computation which accesses to all parts of the 3D
scene or model. They may require constructing and processing special purpose
models of the 3D geometry and of the lighting parameters, and hence use a
retained mode, in which the rendering software uses random access to the
various pars of the 3D model.
Volumetric visualization applies
to models represented by samples of one or more functions over a 3D domain. The
process is mostly used for medical and scientific visualization. It is
computationally expensive and poorly supported by current generation graphics
hardware (although special purpose parallel machines are under development at
Stony Brook and elsewhere). A large subset of volumetric visualization may be
performed to computing iso-surfaces
and by using hardware assisted shading techniques.
Shading may be done in immediate mode, where the graphics hardware and software
utilities need only have access to each portion of the model once. This mode
accommodates procedural models well and does not require replicating, in a
graphics format, the data already available in the application's database.
However, data for shading may be cached by the graphics subsystem by involving
the retained mode. This latter solution is particularly suitable
for static scenes or for animated assemblies of rigid models, because it
relieves the application from the burden of traversing its database for each
frame and because it significantly reduces the communication between the CPU
and the graphics adapter.
Commands passed to a graphics
subsystem via an API are executed to produce an image on a monitor. This
process involves two major components - the geometry processor and the
rasterizer.
The Geometry processor transforms the vertices according to the
interactively specified view, computes the light intensity and color reflected
by the associated surfaces at the vertices, and clips the faces by eliminating
portions that fall outside of the display window. These computations are
usually performed in floating point for maximum accuracy in industrial
applications.
The Rasterizer receives the coordinates of the transformed vertices
that appear on the viewing window and the associated colors. It must then scan
convert the faces defined by these
vertices. Scan conversion includes computing the color and depth (distance to
the view point) at each pixel covered by the projection of the face. The depth
is used in conjunction with a z-buffer to eliminate hidden faces. Scan
conversion is typically done in fixed point using dedicated integer arithmetic
units and fast memory access (for the z-buffer and the color buffer). Some
rasterizers support sub-pixel resolution and deep a (32- bit) z-buffer.
Both geometry processing and
rasterization functions may be done entirely in software . The amazing
peformance of current 3D graphics boards is achieved through multi-processor
subsystems that integrate several geometric processors, several rasterizers,
and fast color and z-buffer memory.
A large number of extensions
to the capabilities of the graphics subsystem have been offered to accelerate
rendering and to relieve the application or the visualization modules from
graphics oriented tasks. Important examples of advanced capabilities for OpenGL
include clipping planes, stencils, texture maps, solid textures, and panoramic
frame buffers, and programmable shaders.
Clipping planes permit the definition of planes that remove portions
of a model and thus reveal internal details or structures. They are
particularly useful in design review and in medical or scientific
visualization.
Stencils are masks that may be set independently for each
pixels and combined logically to selectively inhibit pixels update. They are
important for advanced graphical effects but require additional pixel memory
and logic.
Texture maps provide the means for attaching digitized images onto
the surfaces of 3D objects in the scene. They are very effectively used to
speed-up design and rendering by replacing geometric details with a picture.
Supporting textures efficiently requires high bandwidth to the graphic adapter
and more on board memory.
Solid textures are
important for medical imaging and scientific visualization because they permit
to use the graphics adapter to display the image on a cross-section through a
volumetric model. Solid textures impose even further bandwidth and memory
constraints on the graphics subsystem than image textures.
Programmable shaders make it possible to define material properties and
reflectance responses suitable for creating realistic images. They may also be
used to perform custom functions, which need not be graphic related.
Panoramic buffers capture a 360 degrees image around a given viewpoint.
They may be created by automatically stitching photographs of real scenes in
all directions or created in software by rendering (with the desired degree of
realism) complex synthetic scenes. Once created, these panoramic images may be
viewed in realtime without any 3D graphics hardware support. Viewing is limited
to panning (i.e., turning your head), zooming, (which produces some illusion of
motion forward), and selecting links to other views. Combinations of panoramic
buffers for realistic background with 3D rendering of simple animated
foreground objects with textures provide a combination of graphics realism and
interactivity suitable for some entertainment , tourism, and commercial
applications.
The promise of virtual
reality technology is to enable the user to experience and interact with
computer-generated environments that seem real. Such an environment should:
change in real time, in response to a change in viewpoint or model; appear as a
3D world, whether concrete or abstract, to our senses of vision, hearing, and
touch; and respond to direct manipulation. All current VR falls far short of
the goal of full realism. Significant limitations include the time lag between
a gesture and the corresponding updating of the image, the resolution and limited
field of view of head-mounted displays, and provision of realistic touch
sensation and force feedback.
In VR applications, the
user's motions (hand, head, body) are interpreted according to some natural
metaphor (object-in-hand, look-around, drive, etc.) to produce corresponding
graphic feedback. A number of embodiments are possible, from the traditional
mouse-driven desk-top 3D graphics preferred by most industrial customers, to
head-tracking stereo cave wraparound projections appropriate for marketing and
design reviews, to immersive head-mounted display, important for future home
and arcade entertainment systems. The underlying technology evolved from early
military flight simulators, and is still struggling to provide the two basic
components: affordable realtime 3D graphics with scene realism (for
entertainment) or complexity (for industrial or military applications), and
ergonomic display devices (lightweight, high resolution, wide-angle goggles).
Entertainment applications may tolerate lower quality display, but require 3D
quality sound, inexpensive head and hand trackers, live video integrated with
the 3D world, and autonomous animated characters. Tactile I/O interfaces that use
robotics to provide force feedback are still expensive.