The hardware modules available are ATM video, ATM audio, ATM LCD tile, ATM disc brick and DEC Alpha workstations. The video module can accommodate four camera heads and can provide images at six different sizes for each. The audio module has four bidirectional channels and a range of sampling frequencies up to 48KHz. The display tile is based on a 640x480 active matrix display. The disc brick uses RAID-3 technology and provides 8Gbytes of storage.
The system has been deployed in the laboratory and some two hundred modules and switches are available for experimentation. For the time being, raw video is being used to make easier the development of applications which incorporate the use of agents.
The software platform consists of two components; an object oriented applications environment, and the applications themselves written in a scripting language called Tcl/Tk. The applications environment is a peer-to-peer architecture which uses active object to represent information sources, sinks, data converters and so on. Data can flow from module to module on connections between them. Connections between modules are simple, reliable and unbuffered. More complex connections are represented by special intermediate modules. Modules for providing basic agent features are available, ranging from simple motion and sound observers to gesture, speech and face recognition components. Applications can be prototyped rapidly and different combinations of features evaluated with ease.
Applications - Applications include a media server which simultaneously provides many channels at many sizes for viewing on a workstation or display tile.
A multi-way video phone uses four video streams and an audio stream between the corresponding parties. The cameras are used to provide head and shoulder and more general views into an office. Views of documents are available from a rostrum camera above the desk. Four microphones and speakers provide hands-free audio to any part of the office. In a conversation, all video streams are sent to the recipient who can choose what to watch at the largest size. Additionally, the streams can be sent to an agent which suggests or controls the way sizes are allocated to views. The decision is a combination of the amount of motion, where in the field of view the motion is taking place, together with some hysteresis to prevent flicking between scenes. It also incorporates the user's options on a per-application and per-office basis. When two corresponding parties operate in this way a total of 30 streams are sent across the ATM network.
The video mail application records all views so that subsequently the recipient can choose which (one or many) to view. This presents a high load on the storage system because each view must be recorded at maximum size as at that it is not known how it will be subsequently presented.
Finally, a hand tracker is shown where a two-stage algorithm starts by searching for a hand in a scene, and, having found it, draws an outline and attempts some gesture recognition.
The Medusa system is currently being used to develop a variety of algorithms suitable for use by agents ranging from simple ones which can provide ubiquitous service to all clients to complex ones which are invoked as required.