Image and Video Analysis

Research

dist.png

Medial features are image regions of arbitrary scale and shape, extracted without explicit scale space construction. They rely on a weighted distance map of image gradient, computed using an exact linear-time algorithm. The corresponding weighted medial axis is then decomposed into a graph representing image structure. A duality property enables reconstruction of regions using the same distance propagation. We select features according to our shape fragmentation factor, favoring those well enclosed by boundaries.

wmain.png

We present a spatial matching model that is flexible and allows non-rigid motion and multiple matching surfaces or objects, yet is fast enough to perform re-ranking in large scale image retrieval. Correspondences between local features are mapped to the geometric transformation space, and a histogram pyramid is then used to compute a similarity measure based on density of correspondences. Our model imposes one-to-one mapping and is linear in the number of correspondences. We apply it to image retrieval, yielding superior performance and a dramatic speed-up compared to the state of the art.

YingYang_half_1-5_165_numFeat_12.png

We propose a detector that starts from single scale edges and produces reliable and interpretable blob-like regions and groups of regions of arbitrary shape. The detector is based on merging local maxima of the binary distance transform guided by the gradient strength of the surrounding edges.

scm.JPG

State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall.

maps_0.jpg

We present a new approach to image indexing and retrieval, which integrates appearance with global image geometry in the indexing process, while enjoying robustness against viewpoint change, photometric variations, occlusion, and background clutter. Each image is represented by a collection of feature maps and RANSAC-like matching is reduced to a number of set intersections. We extend min-wise independent permutations and finally exploit sparseness to build an inverted file whereby the retrieval process is sub-linear in the total number of images. We achieve excellent performance on 10^4 images, with a query time in the order of milliseconds.

overview_v07.png

We use saliency for spatiotemporal feature detection in videos by incorporating color and motion apart from intensity. Saliency is computed by a global minimization process constrained by pure volumetric constraints, each of them being related to an informative visual aspect inspired by the Gestalt theory.

SystemArchitecture.jpg

The painters of the Byzantine and post Byzantine artworks use specific rules and iconographic patterns for the creation of sacred figures. Based on these rules, the sacred figure depicted in the artwork is recognizable. In this work, we propose an automatic knowledge-based image analysis system used for Byzantine icons classification on the basis of the sacred figure recognition.

ISOsaliency.png

Based on established computational models of visual attention we propose novel models and methods both for spatial (images) and spatiotemporal (video sequences) analysis. Applications include visual classification and spatiotemporal feature detection.

test6.jpg

In this work we propose an object detection approach that extracts a limited number of candidate local regions to guide the detection process. The basic idea of the approach is that object location can be determined by clustering points of interest and hierarchically forming candidate regions according to similarity and spatial proximity predicates. Statistical validation shows that the method is robust across a substantial range of content diversity while its response seems to be comparable to other state of the art object detectors.

sem.antonis.labels.jpg

Automatic segmentation of images and videos is a very challenging task in computer vision and one of the most crucial steps toward image and video understanding. In this research work we propose to include semantic criteria in the segmentation process to capture the semantic properties of objects that visual features, such as color or texture, are not able to describe.

context-intro2.jpg

The idea behind the use of visual context information responds to the fact that not all human acts are relevant in all situations and this holds also when dealing with image analysis problems. Since visual context is a difficult notion to grasp and capture, in our research work we restrict it to the notion of ontological context. The latter is defined as part of a "fuzzified" version of traditional ontologies. Typical problems to be addressed include how to meaningfully readjust the membership degrees of image regions and how to use visual context to influence the overall results of knowledge-assisted image analysis towards higher performance.

thesaurus0.png

The motivation of this work is to tackle the problem of high-level concept detection within image and video documents using a globally annotated training set. The goal is to determine whether a concept exists within an image along with a degree of confidence and not its actual position. Since this approach begins with a coarse image segmentation, the high-level concepts that is able to tackle can be described as "materials" or "scenes". MPEG-7 color and texture features are locally extracted from coarsely segmented regions using an RSST variation. Using a significantly large set of images and after the application of a hierarchical clustering algorithm on all regions, a relatively small number of them, is selected. These regions are called "region types". This set of region types composes a visual dictionary which facilitates the mapping of low- to high-level features.