New Definition

A New Definition of Multimedia Architecture

Hypertext started with text passages sensitive to mouse clicks. When the first animation programs emerged, a similar principle was realized for graphics: the mouse made pictures move, made characters speak, hypertext became hypermedia. In the object-oriented world of computer programs, texts, buttons, graphic elements, content components, and components of the user interface are all the same in one respect: They are all objects that at a touch can set off a script and thus an action, or send a message to other objects. This principle even applies to digital films: more recent versions of digital films – e.g. Apple’s QuickTimeVR – also realize the object-oriented principle. They can be navigated by mouse movements, and they have sensitive areas which at a mouse contact either give out information, or branch to other films. Products working with this technology are called »augmented reality«, an enriched reality in contrast to »virtual reality«, the artificial reality [Bederson/Druin (1995)]. This is because they represent the physical world in the form of video, allow movement in real space, and add information to that reality, e.g. in the form of superimposed three-dimensional graphics as an »annotation of reality« [Feiner, MacIntyre et al (1993), 53], while virtual reality generates the navigable spaces as completely artificial animations.

Object Orientation

One could understand this description as the result of modern, object-oriented programming that does not have anything to do with multimedia at first glance. But this most advanced variant of programming only shows particularly clearly what had been laid out and aimed at in multimedia relatively early. In the days when people wanted to combine a videodisc with a program, but the program unfortunately appeared on one monitor, and the video on another, the integration of film and program on a single monitor and the control of the film through a manipulation of pictorial elements in the film was exactly what the developers were aiming at. They still referred to this technology as »Interactive Videodisc« technology then, in the mid-eighties, and they did not yet say multimedia, but it was multimedia they were speaking of. In the following section, I would like to infer the definition of multimedia from the most advanced stage of development, and attempt a new definition of multimedia and hypermedia with the object-oriented frame of mind.

Multimedia Space

Environment

The architecture of a multimedia system consists of an environment that can comprise more than just the program in the computer, e.g. the class, the teacher, the instruction, and the excursion. We speak of a working or learning environment, and also imply the institutional, social, and communicative context in which a multimedia program is used, while the multimedia system in a more strict sense means the program in the computer. This, again, is an environment in a special thematic context. This environment consists of a visual representation space with graphical objects on the display of a computer, a symbol space with multimedia objects and messages, and an event space of user actions and program routines (user, learner, interactivity, dialogue).

Space: Representation Space Symbol Space Event Space

Environment is a term that frequently appears in connection with multimedia applications and in constructivist learning theory, »information space« is a term mostly encountered in hypertext literature and literature on networks or graphical databases [Caplinger (1986)]. I am going to use the term »space« in the following. In choosing this term, I am taking the exemplary realization of the multimedia concept in AthenaMuse as a model, in which every application has spatial structures [Hodges/Sasnett (1993), 60ff]. While most multimedia applications today are still working in two dimensions, with static areas in the x,y coordinates of the display, things are constructed three-dimensionally and dynamically in AthenaMuse. I said that the multimedia space consists of a representation space, a symbol space, and an event space: Fischer and Mandl (1990) make a very similar distinction. They distinguish between the surface structure of the hypermedium, its underlying relational and associative structures, and the subjective structure added by the user. I am going to come back to this.

Representation Space

The Representation Space is the level of representation commonly referred to as the graphical user interface. This representation space can have mimetic qualities (isomorphism, representation of real objects, the world, the territory, the scene), it points to a symbol space, a deep structure, its objects can represent abstract entities through symbolic forms, or be purely graphical features without semantics. The multimedia representation space has a spatial (space, location) and/or a temporal dimension (movement, time, story). One might also call it a microworld [but s. microworld definitions which I will go into later]. So far, I see the distinction of representation space and symbol space quite in agreement with the distinction of Dillenbourg and Mendelsohn (1992), who subdivide the interaction space of intelligent programs into a representation space and an event space, and refer to pairs of representation and event as microworlds. The »mapping«, the correspondence of physical and mental forms of representation, is a quite demanding and difficult task.

Symbol Space

The symbol space appears in the representation space as a representational metaphor for abstract or concrete worlds, for the meaning of the representation. From the learner’s perspective, it can also assume imaginative, creative or social, political, and psychological dimensions. Implied semantic relations are symbolism, functionality, discontinuity, isomorphism etc. The Americans, never shy of using known images in a new context, have used the term »rhizome« (rootstock) for hypertext or network structures [e.g. Burnett (1992)]. Multimedia architecture has indeed some similarities to a rhizome, a rootstock growing subterraneously at whose enlarged points the actual fruits are produced, while the plant above the earth only shows leaves and blossoms: the symbol space contains the plans and intentions of its designer, implicitly, it also contains the curriculum and learning objectives for the user, and at the same time, it consists of the user’s constructs and interpretations, creativity and imagination. Green (1991) probably means something like this when he distinguishes between the surface phenomena of the user interface, and its cognitive dimensions. He criticizes the neglect of the cognitive dimension in Human-Computer-Interface (HCI) research: »Most HCI evaluations and descriptions focus on the surface features: they treat rendering, not structure. Indeed, this goes so far that under the guise of ‘cognitive modelling’ HCI researchers have generated a crop of papers about how fast can the mouse be moved to a menu item or to a button […] Typically, no mention is made of parsing, conversational analysis, determinants of strategy, or many other central cognitive concerns« (298). I doubt whether the alternative phenomena cited by Green represent cognitive dimensions that constitute the user’s interpretations. Green chooses typically psychological criteria for user behaviour: Viscosity, Hidden Dependencies, Premature Commitment, Perceptual Cueing, Role-expressiveness. If we see the symbol space as a space of information or of symbolic forms of expressions, whole worlds of criteria are missing again.

Mayes, Kibby et al (1990) distinguish between multimedia presentations and multimedia interfaces. It is only the communication between learner and system that constitutes the value of the multimedia environment. Such an assumption also explains the serious difference between presentations and interactive multimedia programs for teaching and research, treated separately as they are in this book. De Hoog, de Jong et al (1991) take this one step further. They distinguish the modes conversation and model in the »space of interface« [cf. the term of »design space of interfaces« in Frohlich (1992)], and assign different user interactions to them as input and output conditions.

Event Space

The event space is often referred to as interface or learning environment. These terms are usually treated as interchangeable categories [Nesher (1989), 188]. They designate isomorphic or homomorphic environments for information units that may vary between natural environment and formal representation in mathematics, and whose function may be illustrative or exemplifying. The relations between the two environments is ensured through rules of correspondence. The underlying assumption is that knowledge is domain-specific, while intelligence is domain-independent. In order to distinguish my definition form earlier approaches, I refer to the interface as event space, in which the user’s interaction with multimedia objects occurs. Two dimensions result from this, that of the user or learner, and that of the interaction or communication with the program. I am going to discuss the learner in a separate section of this chapter (»The Learner’s Role in Multimedia Systems«), and Interaction as well (»Interactivity of Multimedia Systems«). I merely want to introduce the distinction here, which will save us an analysis later.

In terms of programming technology, the event space is nothing more than an »event cycle« waiting for low-level events and reacting to them. I am not concerned with this technical level of interaction in this book. The event space has both spatial and temporal aspects. When Allinson (1992b) defines navigation as the »activity of moving through an information space« (287), and calls this navigation »a sequence of purely physical events«, he means exactly this, that from the perspective of programming technology and computer technology the event space is a physical intervention, which however enables navigation in the symbol space on a higher level. Such categories are reminiscent the classification of the computer as a physical, logical, and abstract machine [Winograd/Flores (1987)].

Symbol Space

The event space offers access to a world of data, to information, or to the semantic level, the symbol space. In the latter, physical interaction turns into semantic interpretation, the meaningful »navigating« controlled by subject matter, intentions and objectives, that which is known as »browsing«. It is this dimension of the multimedia space that interests me above all. Interaction is the decisive element in connecting representation space and symbol space, without it, no information, or more precisely, no meaning is transmitted. It is only in interaction that the meaning of the multimedia objects is realized in the interpretative act of the user. Roth (1997) points out that »computers are systems which – at least up to now – execute exclusively syntactical operations, whose meanings are only constituted through the human user.« (28) But if the meaning of multimedia objects is only constituted in the communicative interaction of the user with the program, then the pedagogical-methodical design of this interaction assumes a decisive importance. This not only says something about the relevance of interaction in multimedia, but also opens up the event space for pedagogical intentions, intervention and interpretation. The event space is thus always also a learning space for the user.

Multimedia Objects

A multimedia object belongs to both representation space and symbol space, and to the event space as well. It thus consists of an interactively manipulable surface object (foreground, representation) that reacts to actions and has methods that are triggered off by respective events, and a semantic deep structure consisting of the qualities ascribed by the author or user. McAleese (1992), who distinguishes surface knowledge, which is distributed in the hypertext network, from deep knowledge, which is represented in the nodes (14), probably has something similar in mind.

Objects in multimedia are visual or acoustic objects representing the concrete or abstract. The scientific designation of these objects varies according to the perspective from which they are viewed. I will give only two examples of many, in order to sketch that such versions quickly become so specialized that they have hardly any meaning for our context:

From the point of view of object-oriented programming [Steinmetz (1993)], the multimedia object type consists of ‘compound multimedia objects’ (CMO), which are in turn made up out of CMOs and ‘basic multimedia objects’ (498). Media can be understood as classes in the spirit of object-oriented terminology according to Steinmetz (491ff); he also understands communication-specific metaphors as classes.
Bornman and von Solms (1993) make use of artificial intelligence terminology in order to characterize multimedia objects: multimedia consists of »frames« and »slots«, and the objects can inherit and pass on their characteristics (264). Frames may be ordered hierarchically and taxonomically, and form classes and super-classes. Relations are formed through attributed or operations and procedures.

The Surface of Multimedia Objects

The surface of multimedia objects belongs to the representation space. As a rule, it consists of graphical representations of deep objects like text, numbers, graphics, sound, music, picture and film, but also of relations and procedures. The surface objects have their own characteristics and methods (e.g. moving icons), which may differ from the characteristics and methods of the objects represented by them.

The objects in the representation space are usually text objects or button objects, but also graphical objects like diagrams and pictures, with paths and polygons occuring less frequently [Casey (1992)]. The surface structure is made up of graphical objects like fields, cursor, buttons, icons, labels, pictures, diagrams, and maps. Their graphical appearance often has a special functionality for user navigation. Buttons may be embedded or exist as separate visual entities, they may have labels or appear as small icons: »If the intended meaning of a button is expressed graphically, we speak of an icon« [Irler/Barbieri (1990), 264]. Irler and Barbieri call the text buttons frequently used in hypermedia applications »intrusive«, and argue in favour of »embedded menus« for hypertext applications [cf. Koved/Shneiderman (1986)]. Vacherand-Revel and Bessière (1992) see graphical representations as especially favourable environments for discovery learning (62).

Deep Structure of Multimedia Objects

The objects in the representation space point to objects in the deep structure, to media. Media may be text, picture and sound (seen from the computer’s perspective), or language (text, digitalized speech, language synthesis), music (synthetical, audio), picture (graphic, photo, video, visualization). But the term medium can be applied much more widely: text may comprise alphanumerical and numerical data, and forms of text design, such as a table of data. Pictures may be non-manipulable figures or graphical, manipulable objects. A spreadsheet may thus be a medium, but the interaction between the values in a table and their representation as a scatter plot may be a medium as well. A card or a window is a logically distinct information unit; a button represents an event that causes a change in the currently displayed information. The contents component of the current information unit can be text, picture, sound, language, or program code. Legget, Schnase et al (1990) distinguish between »information elements« and »abstractions«, and see cards and folders, frames, documents, articles, and encyclopaedias as abstractions already.

Media

Media may be distinguished according to the degree of interactivity they allow: linear media, feedback media, adaptive media, communicative media [Jaspers (1991)]. Media may be static or dynamic [Vacherand-Revel/Bessière (1992)]. Text, numbers, and graphics are static. Animation, speech, music, and video introduce a dynamic, temporal dimension. Time exists in two forms in multimedia, as sequence and as real time. And finally, media may be differentiated according to the cognitively relevant characteristics of their respective technology, the manner of their symbol systems, and their processing capacities [Kozma (1991)]. In this spirit, Blattner and Greenberg (1992) comment on the function of non-language sounds, and distinguish as functions of music: »absolute, programmatic, social and ritual, modifying behaviour, and communication of messages« (134). Horton (1993) describes the visual representations of 14 figures of speech in a similar manner.

Multimedia objects in the surface structure stand in a certain relation towards one another, which can be meant as next-to-each-other, one-above-the-other, parallelization, juxtaposition, hierarchization, or succession (temporal relation). In the deep structure, the objects enter into other relations, which one might call demonstration, illustration, commentary, example, reference, causality, indication, narration, argumentation, etc. Hypertext systems, which use so-called »typed« links, try to assign such meanings to text links, to introduce an aspect of the symbol space into the representation space, so to speak [e.g. Hannemann/Thüring (1992)].

The distinction of surface, meaning, and method perhaps makes Parkes’ (1992) argument that a multimedia system only knows something about the storing of material, but nothing of its contents (98), appear in a different light. This may be true for the individual objects as such, i.e. for the film, the sound object, the text, which themselves have no knowledge of their meaning, but it does not apply to the system as a whole which is meant to represent and construct this very meaning with its structure. But the distinction is a clear indication of the relevance of supporting multiple representations for learning. Since learners generate their own representations and interpretations in dealing with the computer, it makes sense to support this process through multiple representations.

One cannot differentiate between various types of multimedia or hypermedia applications by looking at just the surface structure. It is the shape of directed graphs in the deep structure that is decisive here. If the objects are networked in the form of nodes and links, we speak of hypermedia. Links consist of the actual links and the »link anchors«. Anchors can be represented as buttons, as modified cursor, or as mark-up text.

The relations might also be modelled as a simple or coloured Petri net, as a generalization of existing hypertext concepts with simple directed graphs, or competing paths: »A hypertext consists of a Petri net representing the document’s linked structure, several sets of human-consumable components (contents, windows, and buttons), and two collections of mappings, termed projections, between the Petri net, the human-consumables, and the design mechanisms« [Stotts/Furuta (1989), 7; Stotts/Furuta (1988); Stotts/Furuta (1990)]. The structure of the directed graphs decides the type of multimedia or hypermedia application:

Kiosk Systems

Kiosk systems (chapter 9) merely contain lists of products (table of contents and index), perhaps sorted according to product types, and then branch in the form of a star. There are no further nodes branching from the last elements of the star, so that the user must retrace the path of the graph he has followed.

Guided Tours

Guided Tours (chapter 9) may have more complex graphs that can also follow a ring path, but as a rule follow a similar structure as Kiosk systems, i.e. a clear sequence of nodes leading into dead ends from which the user must then retrace the graphs.

Hypertexts

Hypertexts (chapter 7) on the other hand are not limited to the structural principle of sequentially juxtaposed nodes, but may realize any reference structure.

Electronic Books

This is not the case for electronic books (chapter 8), which constitute hypertext on the basis of the traditional book form, and must limit the range of possible connections in that interest.

The Function of Pictures

There is a whole research branch on the function of pictures in texts and learning programs. Issing and Haack (1985) distinguish between illustrations, analogous images, and logical images. They claim a long-term effect of pictorial forms of coding (115). Pictures are schemata, scripts, and mental models [Weidenmann (1994)]. They serve as

expression of individual experience
learning control
illustration to lend clarity to difficult concepts
stand-in for reality.

The Function of Sound

Sound in multimedia can remain limited to the surface level, have no semantic function, and noinetheless be important for the acceptance of the application. Chadwick (1992) illustrates this with an experiment in the New Mexico Museum of Natural History: They simply detached the audio output of the multimedia system on display for a week. It was found that the quota of visitors staying with the program from start to finish sank rapidly. This effect would probably not have resulted if text and sound were redundant. In order to clear up this question, Barron and Atkins (1994) tested the influence of text and audio redundance (the doubling of information in two media), and concluded that the doubling had no influence on learning success. Multiple media, simply doubled media, have no special effect then. The functionality of speech must meet with a specific situation, the role of the respective media in multimedia must be seen in the media’s differentiating function.

The Methods of Multimedia Objects

The methods of multimedia objects are author-set, permanent, or user-defined, temporary methods, by way of which the objects react to automatic or user-generated events. The forms of manipulation can be indirect and direct, symbolic and manual interactions. The channels used by objects to transport information can be auditive and visual. Objects reacting to interaction exchange messages with other objects. Frequently, the mediation of information follows a hierarchy of objects, as for example in HyperCard, which checks whether a message is executed directly, directed towards another object on the same card, to the card itself, to the stack, and finally to HyperCard itself, or even from there to a certain object in another stack. Some examples for this:

The button >Continue< switches to the next card. In doing that, it triggers off a script that performs a visual effect and a sound, makes one field invisible, and another visible.
The button >Play< calls up a film and plays it.
The button >Music< plays a piece of music from the CD.
The button >Compute< sends a message to another card, collects data from that card, inserts these data into an invisible container, a variable, then calls up a script from the stack that computes statistical figures from these data, and inserts the result into one or more fields on the initial card.

Objects in the sense of multimedia configuration are also devices conected to the computer which are responsible for input or output. More sensible definitions of these classes as objects in the light of the object-oriented paradigm can be found in Steinmetz (1993) and Gibbs and Tsichritzis (1994).

The Dimension of Meaning

The technical combination of the media is a necessary, but not a sufficient condition for the definition of the term hypermedia. I prefer to use the term integration, technical and data-technological integration, but also the integration of the multimedia space levels. If we do not consider the distinction between representation and information level in a purely formal way, then the combination or integration of the media in a multimedia system must also include a dimension of meaning for learning: a multimedia application should show some functionality for learning, it must have a meaning, an added value for learning. The added value can lie in factors of reception or learning psychology, e.g. in the updating of several channels in learning, visualization of abstract circumstances, anchoring the coding of information using several senses, or the dynamic representation of processes and events. But the added value can also consist in the learner’s cognitive constructs and interpretations stimulated by the multimedia environment, in the mental processing of imagined contents. Contextuality, and seeing the subject matter in the wider context of environment, society, and history, and its interpretation by the learner belong here. Only then does the term »Sociomedia« coined by Barrett (1992) become understandable. I would like to mention some examples for the media’s dimension of meaning, which I will discuss in detail later on, in order to explain what I understand by multimedia’s dimension of meaning:

SimNerv

In SimNerv, one can look at pictures of frogs and listen to their croaking. This takes place in a program that offers students of medicine a virtual laboratory, in which they can execute physiological experiments with frogs’ nerves for which, fortunately, no frogs have to be killed anymore. The laboratory and the frogs are separate parts of the application which do not have any concrete relation to each other. The meaning of this combination lies in the justification of the artificial laboratory, it is meant to motivate the substitution of the frog experiments by the artificial laboratory.

Dictionary of Computer Terms with Signs

The Dictionary of Computer Terms with Signs, which I will discuss in detail in chapter 8, integrates encyclopaedia texts on computer terms with the corresponding signs used by deaf people in the form of films in a hypertext environment. The user may search for certain terms as well as for certain features of signs, which could not have been realized in book form.

Beethoven’s Ninth Symphony

A program accompanying an audio CD opens up Beethoven’s Ninth Symphony (The Voyager Company) in two different ways: In one mode, the musicological interpretation accesses certain parts of the music, in the other mode, the explanations appear synchronized to the music playing. Interactivity is thus introduced into something that can usually only be experienced sequentially.

The Problem of Gestalt

In the first case, multimedia serves to motivate a simulation as surrogate for a real experiment, i.e. to justify a form of learning, in the second case, multimedia realizes arbitrary access to a visual language that would otherwise be difficult to learn, in the third case, multimedia makes it possible to experience a serial medium interactively, and networks it with interpretations. In conclusion, I would like to go into another particular aspect of the integration of media, the problem of the integration’s gestalt: Should one accept only those combinations of media that make up a new whole, a gestalt, as in the examples just mentioned, or can multimedia also include combinations of media which are combined in a luxurious or even superfluous manner? Is it possible to make a meaningful distinction between necessary and non-necessary combinations? A learning program on film should be able to access the film, a learning program on music to access the piece of music being discussed. But does an encyclopaedia on film history really have to show 10-second-clips of the films? Does an encyclopaedia on music history have to play a groove of each record? In other words: Should the integration of the media show some functionality beyond the obvious, a functionality that lends an additional level of meaning to the subject matter, in order to constitute multimedia? The multimedia applications offered on the market more often than not simply represent a reconstruction of ‘natural’ (multi-) media on a new technical level. In many cases that which is achieved by multimedia does not go beyond anything that has already been taking place in good instruction with a teacher and several non-integrated media. With this problem, I have addressed a topic that does not have anything to do with the definition of multimedia, but with social criticism of multimedia as a technological trend and market phenomenon. These dimensions of the necessary and the optional are probably difficult to separate in individual cases. But they can perhaps be a useful pointer for stimulating multimedia applications meant to enrich the learning of pupils and students.