SAMT 2006 - Human Language Technology for the semantic Annotation of Multimedia Material

Human Language Technology for the semantic Annotation of Multimedia Material

, DFKI Saarbrucken, Germany

In the field of image/video processing, one speaks often of a "semantic gap" when it comes to the task of annotating/indexing image/video material with high-level semantics on the sole base of low-level features detected by automated image/video analysis. There is a need to add and merge semantics from the analysis of available associated modalities, like speech and text.

The European Network of Excellence K-Space, which started in 2006, is tackling the integration of semantics generated from the analysis of various modalities/media in a principled manner under the umbrella of semantic web technologies and resources. Along the lines of the agenda of this project, the tutorial presents to the (semantic) multimedia community to which extent Human Language Technology can contribute to this challenge.

Tutorial Presentation

Program

Room: Themistocles A
14:00 - 14:45	Initiatives and Projects
14:45 - 15:30	Language Technology for Multimedia Indexing
15:30 - 16:00	Coffee Break
16:00 - 16:30	Automated Multimedia Analysis
16:30 - 17:00	Text-based Indexing and the Linguistic Description Scheme
17:00 - 17:30	Adding semantics to the LDS

Summary

Initiatives and projects

The tutorial presents first past and present initiatives and projects concerned with the use of language technology for indexing multimedia material.

Language technology for multimedia indexing

In a second part, the tutorial goes in some details in the topic of the integrated use of language and semantic web technologies, which can be considered as the preliminary step of the main topic of the tutorial.

Automated multimedia analysis

In a third part, the tutorial summarizes the actual state of the art of automated multimedia analysis and presents the kind of features (the so-called low-level features) this field is using for describing the content of multimedia. We briefly describe the most relevant and standardized representation formalism for the content indexing and retrieval of multimedia material, the MPEG-7 formalism, and also discuss its limitations when it comes to apply semantic web technologies.

Text-based indexing and the Linguistic Description Scheme

In a fourth part, the tutorial is dedicated to integrative issues. We describe a XML based representation formalism for the information that can be gained from text for indexing or retrieving multimedia material. This representation formalism, called LDS (Linguistic Description Scheme) is already integrated in MPEG-7 and supports the manual text indexing of video material in 4 different fashions: as free text, keywords, structured annotation (the "who", "what", "why", "how" etc. of what can be seen in the image/video) or linguistic dependencies. We insist here on ways to provide this text-based indexing of image/video material in an automatic fashion.

Adding semantics to the LDS

In the last part of the tutorial, we present ways of adding "semantics" to the LDS formalism, using here semantic web technologies and resources. Also we discuss the issue of integrating results of the language and multimedia analysis steps in a principled manner, and more concretely offering a formal model for supporting this integration.

Back to tutorials page