A New Definition of Multimedia Architecture
Hypertext started with text passages sensitive to mouse clicks. When the
first animation programs emerged, a similar principle was realized for
graphics: the mouse made pictures move, made characters speak, hypertext
became hypermedia. In the object-oriented world of computer programs, texts,
buttons, graphic elements, content components, and components of the user
interface are all the same in one respect: They are all objects that at
a touch can set off a script and thus an action, or send a message to other
objects. This principle even applies to digital films: more recent versions
of digital films – e.g. Apple’s QuickTimeVR – also realize the object-oriented
principle. They can be navigated by mouse movements, and they have sensitive
areas which at a mouse contact either give out information, or branch to
other films. Products working with this technology are called »augmented
reality«, an enriched reality in contrast to »virtual reality«,
the artificial reality [Bederson/Druin (1995)]. This is because they represent
the physical world in the form of video, allow movement in real space,
and add information to that reality, e.g. in the form of superimposed three-dimensional
graphics as an »annotation of reality« [Feiner, MacIntyre et
al (1993), 53], while virtual reality generates the navigable spaces as
completely artificial animations.
Object Orientation
One could understand this description as the result of modern, object-oriented
programming that does not have anything to do with multimedia at first
glance. But this most advanced variant of programming only shows particularly
clearly what had been laid out and aimed at in multimedia relatively early.
In the days when people wanted to combine a videodisc with a program, but
the program unfortunately appeared on one monitor, and the video on another,
the integration of film and program on a single monitor and the control
of the film through a manipulation of pictorial elements in the film was
exactly what the developers were aiming at. They still referred to this
technology as »Interactive Videodisc« technology then, in the
mid-eighties, and they did not yet say multimedia, but it was multimedia
they were speaking of. In the following section, I would like to infer
the definition of multimedia from the most advanced stage of development,
and attempt a new definition of multimedia and hypermedia with the object-oriented
frame of mind.
Multimedia Space
Environment
The architecture of a multimedia system consists of an environment that
can comprise more than just the program in the computer, e.g. the class,
the teacher, the instruction, and the excursion. We speak of a working
or learning environment, and also imply the institutional, social, and
communicative context in which a multimedia program is used, while the
multimedia system in a more strict sense means the program in the computer.
This, again, is an environment in a special thematic context. This environment
consists of a visual representation space with graphical objects on the
display of a computer, a symbol space with multimedia objects and messages,
and an event space of user actions and program routines (user, learner,
interactivity, dialogue).
Space: Representation Space Symbol Space Event Space
Environment is a term that frequently appears in connection with multimedia
applications and in constructivist learning theory, »information
space« is a term mostly encountered in hypertext literature and literature
on networks or graphical databases [Caplinger (1986)]. I am going to use
the term »space« in the following. In choosing this term, I
am taking the exemplary realization of the multimedia concept in AthenaMuse
as a model, in which every application has spatial structures [Hodges/Sasnett
(1993), 60ff]. While most multimedia applications today are still working
in two dimensions, with static areas in the x,y coordinates of the display,
things are constructed three-dimensionally and dynamically in AthenaMuse.
I said that the multimedia space consists of a representation space, a
symbol space, and an event space: Fischer and Mandl (1990) make a very
similar distinction. They distinguish between the surface structure of
the hypermedium, its underlying relational and associative structures,
and the subjective structure added by the user. I am going to come back
to this.
Representation Space
The Representation Space is the level of representation commonly referred
to as the graphical user interface. This representation space can have
mimetic qualities (isomorphism, representation of real objects, the world,
the territory, the scene), it points to a symbol space, a deep structure,
its objects can represent abstract entities through symbolic forms, or
be purely graphical features without semantics. The multimedia representation
space has a spatial (space, location) and/or a temporal dimension (movement,
time, story). One might also call it a microworld [but s. microworld definitions
which I will go into later]. So far, I see the distinction of representation
space and symbol space quite in agreement with the distinction of Dillenbourg
and Mendelsohn (1992), who subdivide the interaction space of intelligent
programs into a representation space and an event space, and refer to pairs
of representation and event as microworlds. The »mapping«,
the correspondence of physical and mental forms of representation, is a
quite demanding and difficult task.
Symbol Space
The symbol space appears in the representation space as a representational
metaphor for abstract or concrete worlds, for the meaning of the representation.
From the learner’s perspective, it can also assume imaginative, creative
or social, political, and psychological dimensions. Implied semantic relations
are symbolism, functionality, discontinuity, isomorphism etc. The Americans,
never shy of using known images in a new context, have used the term »rhizome«
(rootstock) for hypertext or network structures [e.g. Burnett (1992)].
Multimedia architecture has indeed some similarities to a rhizome, a rootstock
growing subterraneously at whose enlarged points the actual fruits are
produced, while the plant above the earth only shows leaves and blossoms:
the symbol space contains the plans and intentions of its designer, implicitly,
it also contains the curriculum and learning objectives for the user, and
at the same time, it consists of the user’s constructs and interpretations,
creativity and imagination. Green (1991) probably means something like
this when he distinguishes between the surface phenomena of the user interface,
and its cognitive dimensions. He criticizes the neglect of the cognitive
dimension in Human-Computer-Interface (HCI) research: »Most HCI evaluations
and descriptions focus on the surface features: they treat rendering, not
structure. Indeed, this goes so far that under the guise of ‘cognitive
modelling’ HCI researchers have generated a crop of papers about how fast
can the mouse be moved to a menu item or to a button […] Typically, no
mention is made of parsing, conversational analysis, determinants of strategy,
or many other central cognitive concerns« (298). I doubt whether
the alternative phenomena cited by Green represent cognitive dimensions
that constitute the user’s interpretations. Green chooses typically psychological
criteria for user behaviour: Viscosity, Hidden Dependencies, Premature
Commitment, Perceptual Cueing, Role-expressiveness. If we see the symbol
space as a space of information or of symbolic forms of expressions, whole
worlds of criteria are missing again.
Mayes, Kibby et al (1990) distinguish between multimedia presentations
and multimedia interfaces. It is only the communication between learner
and system that constitutes the value of the multimedia environment. Such
an assumption also explains the serious difference between presentations
and interactive multimedia programs for teaching and research, treated
separately as they are in this book. De Hoog, de Jong et al (1991) take
this one step further. They distinguish the modes conversation and model
in the »space of interface« [cf. the term of »design
space of interfaces« in Frohlich (1992)], and assign different user
interactions to them as input and output conditions.
Event Space
The event space is often referred to as interface or learning environment.
These terms are usually treated as interchangeable categories [Nesher (1989),
188]. They designate isomorphic or homomorphic environments for information
units that may vary between natural environment and formal representation
in mathematics, and whose function may be illustrative or exemplifying.
The relations between the two environments is ensured through rules of
correspondence. The underlying assumption is that knowledge is domain-specific,
while intelligence is domain-independent. In order to distinguish my definition
form earlier approaches, I refer to the interface as event space, in which
the user’s interaction with multimedia objects occurs. Two dimensions result
from this, that of the user or learner, and that of the interaction or
communication with the program. I am going to discuss the learner in a
separate section of this chapter (»The Learner’s Role in Multimedia
Systems«), and Interaction as well (»Interactivity of Multimedia
Systems«). I merely want to introduce the distinction here, which
will save us an analysis later.
In terms of programming technology, the event space is nothing more
than an »event cycle« waiting for low-level events and reacting
to them. I am not concerned with this technical level of interaction in
this book. The event space has both spatial and temporal aspects. When
Allinson (1992b) defines navigation as the »activity of moving through
an information space« (287), and calls this navigation »a sequence
of purely physical events«, he means exactly this, that from the
perspective of programming technology and computer technology the event
space is a physical intervention, which however enables navigation in the
symbol space on a higher level. Such categories are reminiscent the classification
of the computer as a physical, logical, and abstract machine [Winograd/Flores
(1987)].
Symbol Space
The event space offers access to a world of data, to information, or to
the semantic level, the symbol space. In the latter, physical interaction
turns into semantic interpretation, the meaningful »navigating«
controlled by subject matter, intentions and objectives, that which is
known as »browsing«. It is this dimension of the multimedia
space that interests me above all. Interaction is the decisive element
in connecting representation space and symbol space, without it, no information,
or more precisely, no meaning is transmitted. It is only in interaction
that the meaning of the multimedia objects is realized in the interpretative
act of the user. Roth (1997) points out that »computers are systems
which – at least up to now – execute exclusively syntactical operations,
whose meanings are only constituted through the human user.« (28)
But if the meaning of multimedia objects is only constituted in the communicative
interaction of the user with the program, then the pedagogical-methodical
design of this interaction assumes a decisive importance. This not only
says something about the relevance of interaction in multimedia, but also
opens up the event space for pedagogical intentions, intervention and interpretation.
The event space is thus always also a learning space for the user.
Multimedia Objects
A multimedia object belongs to both representation space and symbol space,
and to the event space as well. It thus consists of an interactively manipulable
surface object (foreground, representation) that reacts to actions and
has methods that are triggered off by respective events, and a semantic
deep structure consisting of the qualities ascribed by the author or user.
McAleese (1992), who distinguishes surface knowledge, which is distributed
in the hypertext network, from deep knowledge, which is represented in
the nodes (14), probably has something similar in mind.
Objects in multimedia are visual or acoustic objects representing the
concrete or abstract. The scientific designation of these objects varies
according to the perspective from which they are viewed. I will give only
two examples of many, in order to sketch that such versions quickly become
so specialized that they have hardly any meaning for our context:
-
From the point of view of object-oriented programming [Steinmetz (1993)],
the multimedia object type consists of ‘compound multimedia objects’ (CMO),
which are in turn made up out of CMOs and ‘basic multimedia objects’ (498).
Media can be understood as classes in the spirit of object-oriented terminology
according to Steinmetz (491ff); he also understands communication-specific
metaphors as classes.
-
Bornman and von Solms (1993) make use of artificial intelligence terminology
in order to characterize multimedia objects: multimedia consists of »frames«
and »slots«, and the objects can inherit and pass on their
characteristics (264). Frames may be ordered hierarchically and taxonomically,
and form classes and super-classes. Relations are formed through attributed
or operations and procedures.
The Surface of Multimedia Objects
The surface of multimedia objects belongs to the representation space.
As a rule, it consists of graphical representations of deep objects like
text, numbers, graphics, sound, music, picture and film, but also of relations
and procedures. The surface objects have their own characteristics and
methods (e.g. moving icons), which may differ from the characteristics
and methods of the objects represented by them.
The objects in the representation space are usually text objects or
button objects, but also graphical objects like diagrams and pictures,
with paths and polygons occuring less frequently [Casey (1992)]. The surface
structure is made up of graphical objects like fields, cursor, buttons,
icons, labels, pictures, diagrams, and maps. Their graphical appearance
often has a special functionality for user navigation. Buttons may be embedded
or exist as separate visual entities, they may have labels or appear as
small icons: »If the intended meaning of a button is expressed graphically,
we speak of an icon« [Irler/Barbieri (1990), 264]. Irler and Barbieri
call the text buttons frequently used in hypermedia applications »intrusive«,
and argue in favour of »embedded menus« for hypertext applications
[cf. Koved/Shneiderman (1986)]. Vacherand-Revel and Bessière (1992)
see graphical representations as especially favourable environments for
discovery learning (62).
Deep Structure of Multimedia Objects
The objects in the representation space point to objects in the deep structure,
to media. Media may be text, picture and sound (seen from the computer’s
perspective), or language (text, digitalized speech, language synthesis),
music (synthetical, audio), picture (graphic, photo, video, visualization).
But the term medium can be applied much more widely: text may comprise
alphanumerical and numerical data, and forms of text design, such as a
table of data. Pictures may be non-manipulable figures or graphical, manipulable
objects. A spreadsheet may thus be a medium, but the interaction between
the values in a table and their representation as a scatter plot may be
a medium as well. A card or a window is a logically distinct information
unit; a button represents an event that causes a change in the currently
displayed information. The contents component of the current information
unit can be text, picture, sound, language, or program code. Legget, Schnase
et al (1990) distinguish between »information elements« and
»abstractions«, and see cards and folders, frames, documents,
articles, and encyclopaedias as abstractions already.
Media
Media may be distinguished according to the degree of interactivity they
allow: linear media, feedback media, adaptive media, communicative media
[Jaspers (1991)]. Media may be static or dynamic [Vacherand-Revel/Bessière
(1992)]. Text, numbers, and graphics are static. Animation, speech, music,
and video introduce a dynamic, temporal dimension. Time exists in two forms
in multimedia, as sequence and as real time. And finally, media may be
differentiated according to the cognitively relevant characteristics of
their respective technology, the manner of their symbol systems, and their
processing capacities [Kozma (1991)]. In this spirit, Blattner and Greenberg
(1992) comment on the function of non-language sounds, and distinguish
as functions of music: »absolute, programmatic, social and ritual,
modifying behaviour, and communication of messages« (134). Horton
(1993) describes the visual representations of 14 figures of speech in
a similar manner.
Multimedia objects in the surface structure stand in a certain relation
towards one another, which can be meant as next-to-each-other, one-above-the-other,
parallelization, juxtaposition, hierarchization, or succession (temporal
relation). In the deep structure, the objects enter into other relations,
which one might call demonstration, illustration, commentary, example,
reference, causality, indication, narration, argumentation, etc. Hypertext
systems, which use so-called »typed« links, try to assign such
meanings to text links, to introduce an aspect of the symbol space into
the representation space, so to speak [e.g. Hannemann/Thüring (1992)].
The distinction of surface, meaning, and method perhaps makes Parkes’
(1992) argument that a multimedia system only knows something about the
storing of material, but nothing of its contents (98), appear in a different
light. This may be true for the individual objects as such, i.e. for the
film, the sound object, the text, which themselves have no knowledge of
their meaning, but it does not apply to the system as a whole which is
meant to represent and construct this very meaning with its structure.
But the distinction is a clear indication of the relevance of supporting
multiple representations for learning. Since learners generate their own
representations and interpretations in dealing with the computer, it makes
sense to support this process through multiple representations.
One cannot differentiate between various types of multimedia or hypermedia
applications by looking at just the surface structure. It is the shape
of directed graphs in the deep structure that is decisive here. If the
objects are networked in the form of nodes and links, we speak of hypermedia.
Links consist of the actual links and the »link anchors«. Anchors
can be represented as buttons, as modified cursor, or as mark-up text.
The relations might also be modelled as a simple or coloured Petri net,
as a generalization of existing hypertext concepts with simple directed
graphs, or competing paths: »A hypertext consists of a Petri net
representing the document’s linked structure, several sets of human-consumable
components (contents, windows, and buttons), and two collections of mappings,
termed projections, between the Petri net, the human-consumables, and the
design mechanisms« [Stotts/Furuta (1989), 7; Stotts/Furuta (1988);
Stotts/Furuta (1990)]. The structure of the directed graphs decides the
type of multimedia or hypermedia application:
Kiosk Systems
Kiosk systems (chapter 9) merely contain lists of products
(table of contents and index), perhaps sorted according to product types,
and then branch in the form of a star. There are no further nodes branching
from the last elements of the star, so that the user must retrace the path
of the graph he has followed.
Guided Tours
Guided Tours (chapter 9) may have more complex graphs that
can also follow a ring path, but as a rule follow a similar structure as
Kiosk systems, i.e. a clear sequence of nodes leading into dead ends from
which the user must then retrace the graphs.
Hypertexts
Hypertexts (chapter 7) on the other hand are not limited to
the structural principle of sequentially juxtaposed nodes, but may realize
any reference structure.
Electronic Books
This is not the case for electronic books (chapter 8), which
constitute hypertext on the basis of the traditional book form, and must
limit the range of possible connections in that interest.
The Function of Pictures
There is a whole research branch on the function of pictures in texts and
learning programs. Issing and Haack (1985) distinguish between illustrations,
analogous images, and logical images. They claim a long-term effect of
pictorial forms of coding (115). Pictures are schemata, scripts, and mental
models [Weidenmann (1994)]. They serve as
-
expression of individual experience
-
learning control
-
illustration to lend clarity to difficult concepts
-
stand-in for reality.
The Function of Sound
Sound in multimedia can remain limited to the surface level, have no semantic
function, and noinetheless be important for the acceptance of the application.
Chadwick (1992) illustrates this with an experiment in the New Mexico Museum
of Natural History: They simply detached the audio output of the multimedia
system on display for a week. It was found that the quota of visitors staying
with the program from start to finish sank rapidly. This effect would probably
not have resulted if text and sound were redundant. In order to clear up
this question, Barron and Atkins (1994) tested the influence of text and
audio redundance (the doubling of information in two media), and concluded
that the doubling had no influence on learning success. Multiple media,
simply doubled media, have no special effect then. The functionality of
speech must meet with a specific situation, the role of the respective
media in multimedia must be seen in the media’s differentiating function.
The Methods of Multimedia Objects
The methods of multimedia objects are author-set, permanent, or user-defined,
temporary methods, by way of which the objects react to automatic or user-generated
events. The forms of manipulation can be indirect and direct, symbolic
and manual interactions. The channels used by objects to transport information
can be auditive and visual. Objects reacting to interaction exchange messages
with other objects. Frequently, the mediation of information follows a
hierarchy of objects, as for example in HyperCard, which checks whether
a message is executed directly, directed towards another object on the
same card, to the card itself, to the stack, and finally to HyperCard itself,
or even from there to a certain object in another stack. Some examples
for this:
-
The button >Continue< switches to the next card. In doing that, it triggers
off a script that performs a visual effect and a sound, makes one field
invisible, and another visible.
-
The button >Play< calls up a film and plays it.
-
The button >Music< plays a piece of music from the CD.
-
The button >Compute< sends a message to another card, collects data
from that card, inserts these data into an invisible container, a variable,
then calls up a script from the stack that computes statistical figures
from these data, and inserts the result into one or more fields on the
initial card.
Objects in the sense of multimedia configuration are also devices conected
to the computer which are responsible for input or output. More sensible
definitions of these classes as objects in the light of the object-oriented
paradigm can be found in Steinmetz (1993) and Gibbs and Tsichritzis (1994).
The Dimension of Meaning
The technical combination of the media is a necessary, but not a sufficient
condition for the definition of the term hypermedia. I prefer to use the
term integration, technical and data-technological integration, but also
the integration of the multimedia space levels. If we do not consider the
distinction between representation and information level in a purely formal
way, then the combination or integration of the media in a multimedia system
must also include a dimension of meaning for learning: a multimedia application
should show some functionality for learning, it must have a meaning, an
added value for learning. The added value can lie in factors of reception
or learning psychology, e.g. in the updating of several channels in learning,
visualization of abstract circumstances, anchoring the coding of information
using several senses, or the dynamic representation of processes and events.
But the added value can also consist in the learner’s cognitive constructs
and interpretations stimulated by the multimedia environment, in the mental
processing of imagined contents. Contextuality, and seeing the subject
matter in the wider context of environment, society, and history, and its
interpretation by the learner belong here. Only then does the term »Sociomedia«
coined by Barrett (1992) become understandable. I would like to mention
some examples for the media’s dimension of meaning, which I will discuss
in detail later on, in order to explain what I understand by multimedia’s
dimension of meaning:
SimNerv
In SimNerv, one can look at pictures of frogs and listen
to their croaking. This takes place in a program that offers students of
medicine a virtual laboratory, in which they can execute physiological
experiments with frogs’ nerves for which, fortunately, no frogs have to
be killed anymore. The laboratory and the frogs are separate parts of the
application which do not have any concrete relation to each other. The
meaning of this combination lies in the justification of the artificial
laboratory, it is meant to motivate the substitution of the frog experiments
by the artificial laboratory.
Dictionary of Computer Terms with Signs
The Dictionary of Computer Terms with Signs, which I
will discuss in detail in chapter 8, integrates encyclopaedia texts on
computer terms with the corresponding signs used by deaf people in the
form of films in a hypertext environment. The user may search for certain
terms as well as for certain features of signs, which could not have been
realized in book form.
Beethoven’s Ninth Symphony
A program accompanying an audio CD opens up Beethoven’s
Ninth Symphony (The Voyager Company) in two different ways: In one
mode, the musicological interpretation accesses certain parts of the music,
in the other mode, the explanations appear synchronized to the music playing.
Interactivity is thus introduced into something that can usually only be
experienced sequentially.
The Problem of Gestalt
In the first case, multimedia serves to motivate a simulation as surrogate
for a real experiment, i.e. to justify a form of learning, in the second
case, multimedia realizes arbitrary access to a visual language that would
otherwise be difficult to learn, in the third case, multimedia makes it
possible to experience a serial medium interactively, and networks it with
interpretations. In conclusion, I would like to go into another particular
aspect of the integration of media, the problem of the integration’s gestalt:
Should one accept only those combinations of media that make up a new whole,
a gestalt, as in the examples just mentioned, or can multimedia also include
combinations of media which are combined in a luxurious or even superfluous
manner? Is it possible to make a meaningful distinction between necessary
and non-necessary combinations? A learning program on film should be able
to access the film, a learning program on music to access the piece of
music being discussed. But does an encyclopaedia on film history really
have to show 10-second-clips of the films? Does an encyclopaedia on music
history have to play a groove of each record? In other words: Should the
integration of the media show some functionality beyond the obvious, a
functionality that lends an additional level of meaning to the subject
matter, in order to constitute multimedia? The multimedia applications
offered on the market more often than not simply represent a reconstruction
of ‘natural’ (multi-) media on a new technical level. In many cases that
which is achieved by multimedia does not go beyond anything that has already
been taking place in good instruction with a teacher and several non-integrated
media. With this problem, I have addressed a topic that does not have anything
to do with the definition of multimedia, but with social criticism of multimedia
as a technological trend and market phenomenon. These dimensions of the
necessary and the optional are probably difficult to separate in individual
cases. But they can perhaps be a useful pointer for stimulating multimedia
applications meant to enrich the learning of pupils and students.