You are here: Home / Visual Computing Talks

Visual Computing Talks

Talks held at the Institute for Computer Graphics and Vision



  • Tuesday 12. July 2016, 13:00

Title: Fixed Points of Belief Propagation -- An Analysis via Polynomial Homotopy Continuation
Speaker: Christian Knoll, Institute of Signal Processing and Speech Communication TU Graz
Location: ICG Seminar Room
Abstract: Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both, the structure and the parametrization of the model. To understand this dependence we are interested in finding all fixed points. We formulate BP as a set of polynomial equations, the solutions of which correspond to the BP fixed points. We apply the numerical polynomial-homotopy-continuation (NPHC) method to solve such systems. It is commonly believed that uniqueness of BP fixed points implies convergence to this fixed point. Contrary to this conjecture, we find graphs for which BP fails to converge, even though a unique fixed point exists. Moreover, we show that this fixed point gives a good approximation of the exact marginal distribution.

  • Tuesday 05. July 2016, 13:00

Title: Drone Augmented Human
Speaker: Okan Erat, ICG
Location: ICG Seminar Room
Abstract: Nowadays high demand on usability and functionality results in engineers to develop smart devices centered around human needs. On the way from first pocket computer to the wearable mechanical bodies, the goal was always augmenting the abilities of a human. Therefore human machine interaction became one of the most important aspects of technological developments. In our work we propose to augment a human with the abilities of a drone and provide this functionality in an intuitive way with a head mounted display. User commands the drone in a goal oriented manner without thinking how it will be accomplished.

  • Friday 17. June 2016, 10:00

Title: Discriminative Dictionary Learning for Image Classification
Speaker: Thuy Thi Nguyen, Department of Computer Science, Faculty of Information Technology, Vietnam National University of Agriculture
Location: ICG Seminar Room
Abstract: Dictionary learning (DL), a particular sparse coding model, aims to learn a set of codewords such that a given signal can be well represented by a linear combination of a few codewords. The conventional DL approach was originally proposed for learning a codebook for signal reconstruction. It is then developed for classification by exploiting the label information to make the sparse coefficients discriminative. DL methods for classification have archived state-of-the-art performances. In this talk I will give an overview of DL based approach for classification. A new technique for learning a discriminative DL will be presented. Some issues of should it be sparse or non-sparse DL will be discussed. Speaker's Biography: Dr. Thuy Thi Nguyen received her Ph.D. degree from Graz University of Technology, Austria, in 2009. She is currently lecturer, researcher, head of Department of Computer Science, Faculty of Information Technology, Vietnam National University of Agriculture. Her research interests include object recognition, visual learning, ensemble learning and applications.

  • Tuesday 14. June 2016, 13:00

Title: Generative Adversarial Networks - Properties & Applications
Speaker: David Schinagl
Location: ICG Seminar Room
Abstract: In recent years, deep learning methods gained much attention in the field of computer vision. They achieve outstanding results in discriminative tasks like image classification, where a high-dimensional input is mapped to a class label. In contrast, deep generative models did not reach this level of success until recently. Generative models capture the underlying generation process of the data and can be used to synthesize new samples. A new approach based on artificial neural networks, called generative adversarial networks (GANs), represents an attractive alternative to existing generative models based on maximum likelihood estimation and performs well on various datasets. However, the internal generation process of GANs, from the initial noise vector to the resulting image is relatively unexplored. In this work we investigate the internals of adversarial nets more deeply and demonstrate the universal usability of this model based on two applications. In the first part, GANs are trained on depth-datasets and the resulting networks are analyzed in a variety of ways. We explore the latent noise space to investigate how semantic properties of the synthesized samples are encoded within this space. Moreover, we present two methods to influence the generation process in order to synthesize depth-data with desired properties. In the second part, GANs are applied to two fundamental computer vision tasks: The first one is unsupervised feature learning where we demonstrate that the features learned by the adversarial networks are useful for classification and regression tasks when labeled data are scarce. Finally, GANs are applied to domain specific image super resolution where we show that adversarial nets can be used to significantly increase the quality of upsampled face images.

  • Tuesday 14. June 2016, 01:00

Title: Loss-Specific Training of Memory Efficient Random Forests for Super-Resolution
Speaker: Alexander Grabner
Location: ICG Seminar Room
Abstract: Super-Resolution (SR) addresses the problem of image upscaling by reconstructing high-resolution (HR) images from low-resolution (LR) images. Recently Random Forest (RF) approaches have shown state of the art accuracy in single image SR while almost achieving real time capable speed. However, existing RF approaches for SR have a large memory footprint, since complex models are required to achieve high performance. This limits the practical utilization of RFs for real world SR applications. Especially mobile devices like smartphones or tablets demand memory efficient solutions, since they are limited in resources like RAM and flash storage. In this work, we present three novel methods for constructing RFs with reduced model size: Global Refinement of Alternating Decision and Regression Forests (ADRFs+GR), Additive Global Refinement (AGR) and Intermediate Refined Random Forests (IRRFs). These methods construct RFs with low complexity under a global training objective. Due to global optimization, we achieve improved fitting power for RFs with low model size. In particular, our methods combine and extend recent approaches on loss-specific training of RFs and training of memory efficient RFs. In contrast to previous works, we train RFs with globally optimized structure and globally optimized prediction models. We evaluate our proposed methods for standard machine learning tasks and single image SR. Our methods show significantly reduced model size while achieving competitive performance compared to state of the art RF approaches. Additionally, our training approach is significantly faster than other approaches, which reduce the model size of RFs without compromising on accuracy.

  • Monday 13. June 2016, 11:00

Title: Towards HD Maps: Fine-grained Road Segmentation by Parsing Ground and Aerial Images
Speaker: Gellért Máttyus, DLR, Remote Sensing Technology Institute
Location: ICG Seminar Room
Abstract: Creating detailed road maps is important for many applications, such as infrastructure monitoring, traffic management, urban planning, vehicle navigation, realistic driving simulations, and it will be essential in the future for autonomous driving cars. An approach is presented to extract fine-grained road layout by estimating the number and width of lanes plus the presence and width of parking spots and sidewalks. Importantly, the proposed approach applies existing road maps, aerial images and ground images jointly. The problem is formulated as one of inference in a Markov Random Field (MRF) reasoning about the road layout as well as the alignment between the aerial image, the map and the ground image sequence in a joint energy function. The alignment of ground and aerial images is necessary as even when applying sophisticated GPS-IMU systems, registration errors can still occur. The MRF takes features extracted from the images by deep learning as data terms and formulates the constraints on the lane sizes and the road layout as pairwise potentials. This allows robust estimation also in case when the image evidence (e.g. lane markings) is not visible or is missing. The effectiveness of the approach is demonstrated on a new dataset, which enhances KITTI with aerial images over the city of Karlsruhe, Germany.

  • Tuesday 07. June 2016, 13:00

Title: The ICG 3D Library, an Introduction
Speaker: Markus Rumpler, ICG
Location: ICG Seminar Room
Abstract: The ICG 3D Library (I3D) is a computer vision library developed at our institute providing efficient and reliable algorithms for large-scale multi-view 3D reconstruction problems of ordered and unordered image sets. It offers out of the box tools and modules for camera calibration, offline and online Structure-from-Motion, dense matching, georeferencing and surface reconstruction from the generated point clouds. We present its data structures and core functionality and show recent results of selected applications and papers to demonstrate I3D's versatile features and performance.

  • Tuesday 24. May 2016, 13:00

Title: Grid Loss: Detecting Occluded Faces
Speaker: Michael Opitz, ICG
Location: ICG Seminar Room
Abstract: Detection of partially occluded objects is a challenging computer vision problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of the detection window are occluded, since not every sub-part of the window is discriminative on its own. To address this issue, we propose a novel loss layer for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a convolution layer independently rather than over the whole feature map. This results in parts being more discriminative on their own, enabling the detector to recover if the detection window is partially occluded. By mapping our loss layer back to a regular fully connected layer, no additional computational cost is incurred at runtime compared to standard CNNs. We demonstrate our method for face detection on several public face detection benchmarks and show that our method outperforms regular CNNs, is suitable for realtime applications and achieves state-of-the-art performance.

  • Tuesday 24. May 2016, 13:00

Title: Reyes Rendering of Renderman Scenes on the GPU
Speaker: Martin Sattlecker, (MS-Thesis presentation)
Location: ICG Seminar Room
Abstract: In recent years graphics processing units have become more and more powerful. They are now capable of executing arbitrary code in a massively parallel fashion. The Reyes rendering pipeline is a commonly used method of rendering higher order surfaces in offline renderers. It is possible to execute the Reyes pipeline in parallel. In this work we show a Reyes renderer that is capable of rendering simple scenes with interactive to real time frame rates. Our implementation runs on the graphics card and uses a persistent Megakernel which performs the entire rendering process in one kernel call. The scenes for our renderer are given as Renderman scenes. Our renderer supports materials and surface displacement through the Renderman shading language. We show a compiler that translates these shaders into CUDA code. The shaders are compiled and loaded at runtime. We support texture access from the shaders. Our renderer supports bicubic Bezier patches and Catmull-Clark subdivision surfaces as input geometry. We show an algorithm for subdividing Catmull-Clark subdivision surfaces on the graphics card. This algorithm creates patches from the faces of a subdivision mesh which are then processed in parallel. It takes advantage of the fact that faces of a subdivision mesh with a regular neighborhood can be converted to Bezier patches. We show that the inclusion of this conversion can increase the performance of the rendering pipeline dramatically.

  • Tuesday 10. May 2016, 13:00

Title: Segmentation-Driven Tomographic Reconstructions
Speaker: Rasmus Dalgas Kongskov, Technical University of Denmark
Location: ICG Seminar Room
Abstract: A typical Computed Tomography (CT) scanning pipeline consists of four main stages: scanning, reconstruction, segmentation and analysis. The quality of the segmentation inherently dependents on the quality of the reconstruction. Classically the reconstruction consists of a simple filtered-backprojection type approach followed by a more computationally demanding segmentation stage. In our work we seek to move computational effort to the reconstruction stage by introducing regularization when solving the inverse problem. According to application-specific prior information, we aim to regularize the reconstruction such that we facilitate the subsequent segmentation.

  • Monday 09. May 2016, 14:00

Title: New developments in minimal-case motion estimation
Speaker: Jonathan Ventura, Assistant Professor, Department of Computer Science, University of Colorado Colorado Springs, USA
Location: ICG Seminar Room
Abstract: In this talk I will describe some of my recent and ongoing work in “minimal solvers” for camera motion estimation problems. A classical problem in computer vision is determining relative rotation and translation given point correspondences between two images. The minimal number of points needed depends on the motion model, camera arrangement, and calibration assumptions. I will present recent work (co-authored with C. Arth and V. Lepetit) on an efficient minimal solution for determine the motion of a moving multi-camera rig. The efficiency of the approach comes from application of a first-order rotation approximation. I will also introduce new work on analysis of the epipolar geometry that arises when a camera moves on the unit sphere, and an efficient minimal solver for this case. In both cases, the motion estimation problem can be formulated as a system of multivariate polynomial equations which reduces to a single univariate polynomial equation.

  • Tuesday 03. May 2016, 13:00

Title: Learning a Variational Model for Compressed Sensing MRI Reconstruction
Speaker: Kerstin Hammernik, ICG
Location: ICG Seminar Room
Abstract: Compressed sensing techniques allow MRI reconstruction from undersampled k-space data. However, most reconstruction methods suffer from high computational costs, selection of adequate regularizers and are limited to low acceleration factors for non-dynamic 2D imaging protocols. In this work, we propose a novel and efficient approach to overcome these limitations by learning a sequence of optimal regularizers that removes typical undersampling artifacts while keeping important details in the imaged objects and preserving the natural appearance of anatomical structures. We test our approach on patient data and show that we achieve superior results than commonly used reconstruction methods.

  • Tuesday 03. May 2016, 13:00

Title: Learning Joint Demosaicing and Denoising Based on Sequential Energy Minimization
Speaker: Teresa Klatzer , ICG
Location: ICG Seminar Room
Abstract: Demosaicing is an important first step for color image acquisition. For practical reasons, demosaicing algorithms have to be both efficient and yield high quality results in the presence of noise. The demosaicing problem poses several challenges, e.g. zippering and false color artifacts as well as edge blur. In this work, we introduce a novel learning based method that can overcome these challenges. We formulate demosaicing as an image restoration problem and propose to learn efficient regularization inspired by a variational energy minimization framework that can be trained for different sensor layouts. Our algorithm performs joint demosaicing and denoising in close relation to the real physical mosaicing process on a camera sensor. This is achieved by learning a sequence of energy minimization problems composed of a set of RGB filters and corresponding activation functions. We evaluate our algorithm on the Microsoft Demosaicing data set in terms of peak signal to noise ratio (PSNR) and structured similarity index (SSIM). Our algorithm is highly efficient both in image quality and run time. We achieve an improvement of up to 2:6 dB over recent state-of-the-art algorithms.

  • Tuesday 03. May 2016, 13:00

Title: The 3D-PITOTI Project with a Focus on Multi-Scale 3D Reconstruction using Autonomous UAVs
Speaker: Christian Mostegel (work with Georg Poier, Christian Reinbacher, Manuel Hofer, Friedrich Fraundorfer, Horst Bischof, Thomas Höll, Gert Holler, Axel Pinz)
Location: ICG Seminar Room
Abstract: In this talk, we showcase our outcome of the ambitious 3D-PITOTI project, which involves a multidisciplinary team of over 30 scientists from across Europe. The project focuses on the 3D aspect of recording, storing, processing and visualizing prehistoric rock art in the UNESCO World Heritage site in Valcamonica, Italy. The rock art was pecked into open-air rock formations thousands of years ago and has an inherent 3D nature. After a project overview, we present the results of the Graz University of Technology's contributions in 3D acquisition and processing with a focus on our novel autonomous UAV system. We elaborate the challenges of 3D reconstruction across vastly different scales, from a valley wide reconstruction down to individual peckings on the rock surface. Within this context, we first present a novel 3D scanning device with sub-millimeter accuracy. Aside from correctly scaled 3D information, the scanning device also provides the surface radiometry without the need for artificial shrouding. Additionally, we point out one application for which this highly accurate 3D data has shown to be crucial: The interactive segmentation of the individually pecked figures. Finally, we present a novel autonomous UAV system for acquiring high-resolution images at a few meters distance. The system optimizes scene coverage, ground resolution and 3D uncertainty, while ensuring that the acquired images are suitable for a specific dense offline 3D reconstruction algorithm. There are three main aspects that set this system apart from others. First, the system operates completely on-site without the need for a prior 3D model of the scene. Second, the system iteratively refines a surface mesh, predicts the fulfillment of requirements and can thus correct for initially wrong geometry estimates and imperfect plan execution. Third, the system uses the already acquired 2D images to predict the chances of a successful reconstruction with a specific offline 3D densification algorithm depending on the observed scene and potential camera constellations. We demonstrate the capabilities of our system in the challenging environment of the prehistoric rock art sites and then register the individual reconstructions of all scales in one consistent coordinate frame.

  • Tuesday 26. April 2016, 14:00

Title: Omnidirectional perception for automotive applications
Speaker: Prof. Pascal Vasseur, University of Rouen (
Location: HS i1 (HSEG058J)
Abstract: Omnidirectional vision can be obtained by different kind of sensors (catadioptric camera, fisheye lens, camera network, …) and is now well known and used in many applications because its large field of view. However, it is generally necessary to develop particular treatments in order to deal with the drawbacks of these sensors such as distortions of the images or the non synchronization of the cameras, … In this talk, I will present different automotive applications based on different omnidirectional vision solutions recently developed in our team such as face detection, traffic surveillance and 3D reconstruction with their particular treatments and some results. Short Bio: Pascal Vasseur received the M.S. degree in System Control from Université de Technologie de Compiégne (France) in 1995 and his Ph.D. in Automatic Control from Université de Picardie Jules Verne (France) in 1998. He was associate professor at the Université de Picardie Jules Verne in Amiens between 1999 and 2010. He is now a full professor at the Université de Rouen and is member of LITIS laboratory. His research interests are computer vision and its applications to mobile and aerial robots.

  • Monday 25. April 2016, 14:00

Title: Minimum solutions for Homography estimation and 3D lines reconstruction
Speaker: Prof. Cédric Demonceaux, University of Burgundy
Location: HS i11 "SIEMENS Hörsaal" (ICK1002H)
Abstract: In Computer Vision, many problems need a RANSAC procedure in order to find a solution in presence of noise. The convergence of this approach strongly depends of the number of points we need to model and estimate the problems. Thus, for reducing the number of iterations, we have to find the minimal of points needed for modeling the problem. In this talk, we will see how we can reduce this number of points using prior knowledge. First, we will talk about Homography estimation where, in the general case, we need 4 points for calculating this transformation and we will try to reduce this number of points from 4 to 2 thanks to prior information. Secondly, in the same way, we will focus on 3D line reconstruction in non-central camera. Theoretically, 4 points in a single image are sufficient in order to reconstruct a 3D line. We will see that this number of points can be reduced knowing some information about the 3D position of the camera. Short Bio: Cédric Demonceaux received the M.S. degree in Mathematics in 2001 and the PhD degree in Image Processing from the Université de Picardie Jules Verne (UPJV), France, in 2004. In 2005, he became associate professor at MIS-UPJV. From 2010 to 2014, he has been an CNRS-Higher Education chair at Le2I UMR CNRS, Universit ´e de Bourgogne. Since 2014, he is full Professor at the University of Burgundy. His research interests are in image processing, computer vision and robotics.

  • Tuesday 19. April 2016, 13:00

Title: Automated Segmentation of the Walkable Area from Aerial Images for Evacuation Simulation
Speaker: Fabian Schenk, ICG
Location: ICG Seminar Room
Abstract: Computer-aided evacuation simulation is a very import preliminary step when planning safety measures for major public events. We propose a novel, efficient and fast method to extract the walkable area from high- resolution aerial images for the purpose of evacuation simulation. In contrast to previous work, where the authors only extracted streets and roads or worked on indoor scenarios, we present an approach to accurately segment the walkable area of large outdoor areas. For this task we use a sophisticated seeded region growing (SRG) algorithm incorporating the information of digital surface models, true-orthophotos and inclination maps calculated from aerial images. Further, we introduce a new annotation and evaluation scheme especially designed for assessing the segmentation quality of evacuation maps. An extensive qualitative and quantitative evaluation, where we study various combinations of SRG methods and parameter settings by the example of different real-world scenarios, shows the feasibility of our approach.

  • Tuesday 12. April 2016, 13:00

Title: Inertial-Optical Flow: From Fast UAV Deployment to Moving Object Detection
Speaker: Prof. Stephan Weiss, Alpen-Adria-Universität (AAU), Austria
Location: ICG Seminar Room
Abstract: Abstract: Visual-inertial state estimation has significantly gained on importance in the last few years in both research and Industry. With the advent of powerful computation units, even more complex approaches are now capable of running on-board mobile devices. However, the latency and computational complexity of these algorithms still is an issue for closed loop control of highly resource constraint and agile mobile robots. This talk will discuss a visual-inertial state estimation framework that has ultra-low computational complexity but still has the ability of system self-calibration and, maybe more important, system self-healing upon sensor drop-out or algorithm failure. The framework seamlessly fuses visual odometry based position control with velocity control form inertial and optical flow cues to obtain a fast deployable platform which is robust against otherwise critical events and we will see how this algorithm is supposed to navigate a helicopter in a truly GPS denied and very remote location. We will also shed light to additional use-cases where our inertial-optical flow algorithm makes use information otherwise classified as outliers to detect and characterize moving objects. Biography: Stephan Weiss is Full Professor of Robotics and head of the Control of Networked Systems Lab at the Alpen-Adria-Universität (AAU) in Austria since 2015. He received his MSc in Electrical Engineering and Information Technology in 2008 and his Ph.D. in 2012 from the Eidgenössische Technische Hochschule (ETH) Zurich, Switzerland. His Ph.D. Thesis on "Vision Based Navigation for Micro Helicopters" first enabled GPS independent navigation of small UAVs using on-board visual-inertial state estimation. His algorithms were the key to enable the Mars Helicopter Scout proposal and corresponding proof-of-concept technology demonstration at NASA's Jet Propulsion Laboratory where he worked from 2012 until 2015 as Research Technologist in the Mobility and Robotic Systems Section and where he lectured at the California Institute of Technology.

  • Tuesday 08. March 2016, 13:00

Title: Large scale structure from motion with control points
Speaker: Michal Polic, CMP
Location: ICG Seminar Room
Abstract: Standard approach for solving large scale SfM is to use an approximation and create minimal skeleton of cameras which cover all surfaces of 3D scene. This approach reduces computational time but also increase the dependence on cumulative errors of feature positions, wrong radial distortion calibration and mismatches. After that, is used Bundle Adjustment method, which is a local optimization method, and usually doesn't converge to the global optimum. My approach is based on rigid graph creation. This rigid graph should contain 3D points and cameras with small uncertainty. Resultant scene should be computed faster because it will contain less points and little bit mo re precisely because of small uncertainty. This rigid graph will be also simpler transformed to control points in post-processing step.

  • Tuesday 23. February 2016, 13:00

Title: Direct Stereo Visual Odometry Based on Lines
Speaker: Thomas Holzmann, ICG
Location: ICG Seminar Room
Abstract: We propose a novel stereo visual odometry approach, which is especially suited for poorly textured environments. We introduce a novel, fast line segment detector and matcher, which detects vertical lines supported by an IMU. The patches around lines are then used to directly estimate the pose of consecutive cameras by minimizing the photometric error. Our algorithm outperforms state-of-the-art approaches in challenging environments. Our implementation runs in real-time and is therefore well suited for various robotics and augmented reality applications.

  • Tuesday 09. February 2016, 13:00

Title: Short-term visual tracking using hierarchical appearance models
Speaker: Luka Cehovin, PhD, University of Ljubljana
Location: ICG Seminar Room
Abstract: Efficient modeling and updating of the appearance model is crucial for successful visual tracking. In this talk we will present our work on hierarchical appearance models which structure appearance in multiple layers. The bottom layer contains the most specific information about the appearance of the object while higher layer model the appearance in a more general way. The hierarchical relations are also reflected in the update process where the higher layers guide the lower layers in their update while the lower layers provide a source for adaptation to higher layers if their information is reliable. The benefits of hierarchical appearance models are demonstrated with two hierarchical models, primarily designed to tackle tracking of non-rigid and articulated objects that present a challenge for many existing trackers. We will also briefly talk about our work on experimental evaluation methodology of visual tracking which led us to several insights about tracking and improvements of our methods and has been adopted by the Visual Object Tracking (VOT) challenges.

  • Tuesday 02. February 2016, 13:00

Title: Solving Dense Image Matching in Real-Time using Discrete-Continuous Optimization
Speaker: Alexander Shekhovtsov, Christian Reinbacher
Location: ICG Seminar Room
Abstract: Dense image matching is a fundamental low- level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous op- timization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms in a consistent way. In the discrete setting, we propose a novel optimization algorithm that can be massively parallelized. In the con- tinuous setting we tackle the problem of non-convex reg- ularizers by a formulation based on differences of convex functions. The resulting hybrid discrete-continuous algo- rithm can be efficiently accelerated by modern GPUs and we demonstrate its real-time performance for the applica- tions of dense stereo matching and optical flow.

  • Tuesday 26. January 2016, 13:00

Title: Master Thesis Presentation - Fröhlich, Neuhold, Bauernhofer
Speaker: Fröhlich, Neuhold, Bauernhofer
Location: ICG Seminar Room
Abstract: Speaker: Barbara Fröhlich Title: Quality Assessment of Stereo Matching Algorithms for Planetary Surfaces Abstract: Stereo vision is a wide spread and actively researched field of computer vision. In order to reconstruct a 3D scene from two or more stereo images stereo matching can be applied to compute the required disparity maps. Researchers are intensively working on stereo matching and new types of algorithms are proposed continuously that should serve this task. The quality of the resulting evaluation map is highly dependent on the accuracy of the disparity map that is output of stereo matching. Nevertheless, the methods to measure the performance of these algorithms in terms of accuracy are fewer investigated. In this work, disparity map evaluation measures presented in literature are collected and listed. Ground truth based methods as well as no-reference measures are studied. Additionally, they are tested on two stereo matching algorithms applied to different datasets with ground truth available. Moreover, different types of known errors were afterwards inserted into the disparity maps to find out how the measures behave on those types of errors. The intensive experiments show that some measures respond to types of errors that others completely ignore. The weaknesses as well as the advantages of the evaluated measures are explained and we give a suggestion of disparity map evaluation measures that provide a reliable and meaningful output for quality assessment. Speaker: Gerhard Neuhold Title: Semantic Segmentation with Deep Neural Networks Abstract: Semantic segmentation is about labeling each single pixel in an image with the category it belongs to. There are several applications in a wide range of areas, like robotics, mapping or medical image analysis, in which pixel-level labels are of primary importance. In recent years, deep neural networks have shown impressive results and have become state-of-the-art for several recognition tasks. In this thesis, we investigate into the use of deep neural networks for the task of semantic image segmentation. We adjust state-of-the-art fully convolutional networks, which are designed to label general scenes, to the task of aerial image segmentation. In addition, we transfer the learned feature representation from a large-scale image database of everyday objects for classification to pixel-wise labeling of aerial images. Further, we perform a joint training of the deep neural network and a conditional random field in an end-to-end fashion to reduce the errors that are caused by both modules. Finally, we study semi-supervised learning techniques to decrease the manual labeling effort, which is necessary to transfer learned features from pre-trained classification models. Our proposed semantic segmentation approach is evaluated on a large-scale aerial dataset and improves the state-of-the-art accuracy. We are able to show promising results on two internal aerial datasets used by the Microsoft Photogrammetry team. Our experimental evaluation confirms that end-to-end training of the deep neural network and a conditional random field improves the overall performance. Finally, we show that incorporating unlabeled data to perform finetuning from pre-trained models decreases the manual labeling effort. Speaker: Christoph Bauernhofer Title: Dense Reconstruction On Mobile Devices Abstract: The aim of 3D reconstruction is to infer 3D geometry of the scene from a given set of 2D images. Being one of the most fundamental problems in computer vision, many algo- rithms have been developed in the last years to solve this problem on desktop computers. Modern mobile devices such as tablets and smartphones, however, deliver unprecedented computational power in everyone’s pocket making it possible to tackle this problem on mobile platforms as well. The aim of this master’s thesis is to create the fundamental building blocks: dense tracking and dense depthmap computation, for a novel reconstruction system on mobile devices. The developed tracking system operates directly on images without an interme- diate representation like keypoints. Depthmaps are computed by using a dense multi-view stereo algorithm and optimized by minimizing a global spatially regularized energy func- tional. Using the graphics processing unit (GPU) and highly parallelized state-of-the-art algorithms on these mobile devices, enables us to perform high quality, dense depthmap computations within several seconds.

  • Tuesday 19. January 2016, 13:00

Title: BaCoN: Building a Classifier from only N Samples
Speaker: Georg Waltner
Location: ICG Seminar Room
Abstract: We propose a model able to learn new object classes with a very limited amount of training samples (i.e. 1 to 5), while requiring near zero run-time cost for learning new object classes. After extracting Convolutional Neural Network (CNN) features, we discriminatively learn embeddings to separate the classes in feature space. The proposed method is especially useful for applications such as dish or logo recognition, where users typically add object classes comprising a wide variety of representations. Another benefit of our method is the low demand for computing power and memory, making it applicable for object classification on embedded devices. We demonstrate on the Food-101 dataset that even one single training example is sufficient to recognize new object classes and considerably improve results over the probabilistic Nearest Class Means (NCM) formulation.

  • Tuesday 12. January 2016, 13:00

Title: ICCV 2015 summary
Speaker: Gernot Riegler et al.
Location: ICG Seminar Room
Abstract: Attendees of ICCV will briefly introduce us to a selection of highlights from this years ICCV.

  • Tuesday 15. December 2015, 13:00

Title: Training Activation Functions In Deep Neural Networks
Speaker: Matthias Freiberger
Location: ICG Seminar Room
Abstract: Image recognition is considered one of the most challenging tasks in the field of computer vision. Recently though, convolutional neural networks (CNNs) show excellent results on several recognition data sets. Methods, which train not only the weights between neurons, but also the activation functions of a CNN, show currently the best performance. Nevertheless, these approaches usually train solely steepness parameters for one or several rectifier units, or enforce hard constraints on the shape of their activation functions. In this thesis we present a framework to train a more general family of activation functions that are more expressive and investigate if this way even better performance can be reached. We do so by learning the parameters of a sum of arbitrary base functions. Furthermore, we constrain the optimization process of the parameters in a sensible way to reduce the complexity of our optimization problem as well as keep the number of parameters low. Using our framework, we approach the performance of state-of-the-art methods and outperform rectifier units on three different data sets using two different network architectures. Nevertheless, we find that the range of suitable base functions when training deep structures is confined to functions with largely constant gradients. Therefore, it seems advisable to further pursue the approach of trainable parameters for one or several rectifier-like units in combination with the techniques shown in this thesis, in order to obtain even better performance on state-of-the-art recognition problems.

  • Tuesday 01. December 2015, 13:00

Title: From TomoSAR Point Clouds To Objects
Speaker: Muhammad Shahzad
Location: ICG Seminar Room
Abstract: Synthetic aperture radar (SAR) projects a 3-D scene onto two naive coordinates i.e., “range” and “azimuth”. In order to fully localize a point in 3-D, advanced interferometric SAR (InSAR) techniques are required that process stack(s) of complex-valued SAR images to retrieve the lost third dimension (i.e., the “elevation” coordinate). Among other InSAR methods, SAR tomography (TomoSAR) is the most advanced 3-D imaging technique. By exploiting stack(s) of SAR images taken from slightly different positions, it builds up a synthetic aperture in the elevation direction that enables retrieval of precise 3-D position of dominant scatterers within one azimuth-range SAR image pixel. Geocoding these 3-D scatterer positions from SAR geometry to world (UTM) coordinates provide 3-D/4-D point clouds of the illuminated area with point density of around 1 million points/km2 (using spaceborne TerraSAR-X datastacks). Taking into consideration special characteristics associated to these point clouds e.g., low positioning accuracy, high number of outliers, gaps in the data and rich façade information due to side looking geometry, this presentation will demonstrate the object reconstruction potential of these point clouds using data acquired from both spaceborne and airborne platforms. Experimental results highlighting 3-D reconstruction of two object categories i.e., buildings and individual trees will be presented.

  • Tuesday 24. November 2015, 13:00

Title: Training a Feedback Loop for Hand Pose Estimation
Speaker: Markus Oberweger
Location: ICG Seminar Room
Abstract: We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400 fps on a single GPU.

  • Tuesday 10. November 2015, 13:00

Title: Anatomical Landmark Localization for an Automatic Multi-Factorial Age Assessment System
Speaker: Walter Unterpirker
Location: ICG Seminar Room
Abstract: Anatomical landmark localization in medical images has gained an increasing research interest in the last years. One reason is that many subsequent medical image-processing algorithms benefit from an accurate and reliable preceding automatic localization step. One important application which is considered in this work, is the automated biological age assessment of humans. This is based on the ossification and mineralization process of various anatomical structures. For this thesis, these structures are acquired by non-invasive and ionizing radiation free magnetic resonance imaging. A first step towards such an automated age assessment system is to locate the age-relevant anatomical structures. In this work, Random Regression Forests (RRFs) are explored in more detail to locate structures at hand-bones, wisdom teeth and clavicle bones. Firstly, a geodesic weighting scheme for hand-bone localization is proposed. This is based on the underlying idea that closer and simultaneously less shape-varying structures to an anatomical landmark contribute more to an accurate localization. In a second contribution, the appearance of landmarks are directly incorporated into the RRF framework. Thus allowing to increase the confidence of a correct landmark estimation. Due to strongly varying appearance and shapes within medical images a final contribution investigates the idea of using restricted image information around landmarks. Subsequently, the anatomical variations at the landmarks themselves are explored in more detail by the RRF.

  • Tuesday 03. November 2015, 14:00

Title: MICCAI 2015 summary session
Speaker: Martin Urschler, Philipp Kainz, Christian Payer
Location: ICG Seminar Room
Abstract: Martin Urschler, Philipp Kainz, Christian Payer will report on the latest news from this years MICCAI conference. Please note the change of the starting time. The session will start at 14:00 and is expected to end at 15:15.

  • Tuesday 20. October 2015, 13:00

Title: In-Flight Image Stabilization of a Flying Laser Projector
Speaker: Alexander Isop
Location: ICG Seminar Room
Abstract: Only a few design approaches of MAV based laser-projection systems have been recently studied. Limited payload, flight time and onboard processing power result in several constraints. To date, no practical solution has been introduced. We propose a small and lightweight laser projection system enabling in flight projection utilizing feed-forward compensation for stabilization. While our approach is a first step towards a flying projector, we foresee interesting applications, like providing on-site instructions by projecting information in harsh environments.

  • Tuesday 13. October 2015, 13:00

Title: Learning variational models for blind image deconvolution
Speaker: Erich Kobler
Location: ICG Seminar Room
Abstract: Along with noise, image blur is probably the most widespread reason for image degradation. It originates from a vast variety of sources, including atmospheric turbulences, defocus and motion. Nowadays fast and accurate deblurring algorithms become more and more important due to the ubiquitous smartphones. The majority of recent deblurring algorithms first estimate the point spread function, also known as blur kernel, and then perform a non-blind image deblurring. In this work we introduce a novel approach for both non-blind and blind image deblurring, which is motivated by variational models. We follow the idea of Chen et al. 2015 and derive a network structure which is related to minimizing an iteratively adapted energy functional. Moreover, we present a differentiable projection onto the unit simplex based on the Bregman divergence to constrain the blur kernels. The non-blind as well as blind deblurring networks are trained in a discriminative fashion to enhance properties of natural sharp images because recent discriminative reconstruction approaches demonstrated their superiority in terms of quality and runtime. Both deblurring networks are qualitatively evaluated and numerous experiments demonstrate the clear quality boost of the resulting image and blur kernel estimates. Furthermore, in contrast do neural networks, all individual parameters of the proposed networks can be easily interpreted due to the close relation to energy minimization.

  • Tuesday 29. September 2015, 13:00

Title: Automatic Artery-Vein Separation from Thoracic CT Images Using Integer Programming
Speaker: Christian Payer
Location: ICG Seminar Room
Abstract: Automated computer-aided analysis of lung vessels has shown to yield promising results for non-invasive diagnosis of lung diseases. In order to detect vascular changes affecting arteries and veins differently, an algorithm capable of identifying these two compartments is needed. We propose a fully automatic algorithm that separates arteries and veins in thoracic computed tomography (CT) images based on two integer programs. The first extracts multiple subtrees inside a graph of vessel paths. The second labels each tree as either artery or vein by maximizing both, the contact surface in their Voronoi diagram, and a measure based on closeness to accompanying bronchi. We evaluate the performance of our automatic algorithm on 10 manual segmentations of arterial and venous trees from patients with and without pulmonary vascular disease, achieving an average voxel based overlap of 94.1% (range: 85.0% - 98.7%), outperforming a recent state-of-the-art interactive method.

  • Wednesday 16. September 2015, 16:00

Title: Classifier Adaptation at Prediction Time
Speaker: Christoph Lampert, IST Austria
Location: ICG Seminar Room
Abstract: In the era of "big data" and a large commercial interest in computer vision, it is only a matter of time until we will buy commercial object recognition systems in pre-trained form instead of training them ourselves. This, however, poses a problem of domain adaptation: the data distribution in which a customer plans to use the system will almost certainly differ from the data distribution that the vendor used during training. Two relevant effects are a change of the class ratios and the fact that the image sequences that needs to be classified in real applications are typically not i.i.d. In my talk I will introduce simple probabilistic technique that can adapt the object recognition system to the test time distribution without having to change the underlying pre-trained classifiers. I will also introduce a framework for creating realistically distributed image sequences that offer a way to benchmark such adaptive recognition systems. Our results show that the above "problem" of domain adaptation can actually be a blessing in disguise: with proper adaptation the error rates on realistic image sequences are typically lower than on standard i.i.d. test sets.

  • Tuesday 08. September 2015, 13:00

Title: Robust Bone Marrow Cell Discrimination by Rotation-Invariant Training of Multi-Class Echo State Networks
Speaker: Philipp Kainz
Location: ICG Seminar Room
Abstract: Classification of cell types in context of the architecture in tissue specimen is the basis of diagnostic pathology and decisions for comprehensive investigations rely on a valid interpretation of tissue morphology. Especially visual examination of bone marrow cells takes a consid- erable amount of time and inter-observer variability can be remarkable. In this work, we propose a novel rotation-invariant learning scheme for multi-class Echo State Networks (ESNs), which achieves very high performance in automated bone marrow cell classification. Based on representing static images as temporal sequence of rotations, we show how ESNs robustly recognize cells of arbitrary rotations by taking advantage of their short-term memory capacity.

  • Tuesday 01. September 2015, 13:00

Title: Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties
Speaker: Georg Poier
Location: ICG Seminar Room
Abstract: Model-based approaches to 3D hand tracking have been shown to perform well in a wide range of scenarios. However, they require initialisation and cannot recover easily from tracking failures that occur due to fast hand motions. Data-driven approaches, on the other hand, can quickly deliver a solution, but the results often suffer from lower accuracy or missing anatomical validity compared to those obtained from model-based approaches. In this work we propose a hybrid approach for hand pose estimation from a single depth image. First, a learned regressor is employed to deliver multiple initial hypotheses for the 3D position of each hand joint. Subsequently, the kinematic parameters of a 3D hand model are found by deliberately exploiting the inherent uncertainty of the inferred joint proposals. This way, the method provides anatomically valid and accurate solutions without requiring manual initialisation or suffering from track losses. Quantitative results on several standard datasets demonstrate that the proposed method outperforms state-of-the-art representatives of the model-based, data-driven and hybrid paradigms.

  • Tuesday 14. July 2015, 13:00

Title: CVPR 2015 Summary
Speaker: Paul Wohlhart, Samuel Schulter, Alexander Shekhovtsov
Location: ICG Seminar Room
Abstract: Attendees of CVPR will briefly introduce us to a selection of highlights from this years CVPR.

  • Tuesday 07. July 2015, 13:00

Title: Weakly-supervised learning from images and video
Speaker: Ivan Laptev, INRIA Paris, France
Location: lecture room HS i12
Abstract: Abstract: Recent progress in visual recognition goes hand-in-hand with the supervised learning and large-scale training data. While the amount of existing images and videos is huge, their detailed annotation is expensive and often prohibitive. To address this problem, in this talk we will focus on weakly-supervised learning methods using incomplete and noisy annotation for training. I will first address the learning of human actions from videos and corresponding textual descriptions in the form of movie scripts or narrations. I will describe our recent formulation of this problem in the form of a quadratic program with constraints and will show its successful applications to the joint learning of actions and actors from movies and to the learning of key steps from narrated instruction videos. In the second part of the talk I will focus on recognition from still images and will describe our work on weakly-supervised convolutional neural networks. I will present a network that learns to recognize and localize objects as well as human actions without using location supervision at the training time. Somewhat surprisingly, our weakly-supervised method achieves state-of-the-art performance comparable to its strongly-supervised counterparts. Bio: Ivan Laptev is a research director at INRIA Paris, France. He received Habilitation degree in 2013 from École Normale Supérieure (ENS) in Paris and a PhD degree in Computer Science from the Royal Institute of Technology (KTH) in Stockholm. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of IJCV and TPAMI journals, he was an area chair for CVPR'10, '13, '15, ICCV'11, ECCV'12 '14, ACCV'14 and he will be a program chair for CVPR 2018, he has co-organized several tutorials, workshops and challenges on human action recognition at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). Ivan was awarded ERC Starting Grant in 2012.

  • Monday 06. July 2015, 10:00

Title: What it's like to work on Google's self-driving cars
Speaker: Andreas Wendel, Google[x]
Location: HS i11 (Inffeld)
Abstract: Abstract: Self-driving cars have the potential to transform mobility: They will save lives, time, and offer mobility to those who otherwise don't have it. We've all heard the story; but what is it like to be a researcher there? Are there still challenges to be solved? In my talk, I will give an overview of our approach to solving the driving problem, point out why computer vision for autonomous robots is a challenge, and provide some personal insights into the research and engineering culture at Google. If you have questions about interviewing, internships, or research grants, use this opportunity to ask me directly. The talk will be given in English. Short Bio: Andreas Wendel is a Robotics Researcher at Google[x] in California, working on computer vision for Self-Driving Cars. Prior to joining Google, he was the head of the Aerial Vision Group and lecturer at the Institute for Computer Vision and Graphics at Graz University of Technology, Austria, where he earned his PhD in Computer Science sub auspiciis Praesidentis in 2013.

  • Tuesday 23. June 2015, 13:00

Title: Outdoor Localization using a Particle Filter
Speaker: Christian Poglitsch
Location: ICG Seminar Room
Abstract: We propose an outdoor localization system using a particle filter. In our approach, a textured and geo-registered model of the outdoor environment is used as a reference to estimate the pose of a smartphone. The device's position and the orientation obtained from a Global Positioning System (GPS) receiver and an inertial measurement unit (IMU) are used as a first estimation of the true pose. Then, based on the sensor data multiple pose hypotheses are randomly distributed and used to produce renderings of the geo-referenced virtual model. With vision-based methods, the rendered images are compared to the image received from a smartphone, and the matching scores are used to update the particle filter. The outcome of our system improves the camera pose estimate in real time without user assistance. In contrast to previous methods, it is not necessary to move around until a baseline is found, or to rotate the smartphone. Experimental evaluation shows that the method significantly improves real-virtual alignment in the augmented camera image.

  • Tuesday 2. June 2015, 13:00

Title: Analysis of optically variable devices using a photometric light-field approach
Speaker: Daniel Soukup (AIT)
Location: ICG Seminar Room
Abstract: Diffractive Optically Variable Image Devices (DOVIDs), sometimes loosely referred to as holograms, are popular security features for protecting banknotes, ID cards, or other security documents. Inspection, authentication, as well as forensic analysis of these security features are still demanding tasks requiring special hardware tools and expert knowledge. Existing equipment for such analyses is based either on a microscopic analysis of the grating structure or a point-wise projection and recording of the diffraction patterns. We investigated approaches for an examination of DOVID security features based on sampling the Bidirectional Reflectance Distribution Function (BRDF) of DOVIDs using photometric stereo- and light-field-based methods. Our approach is demonstrated on the practical task of automated discrimination between genuine and counterfeited DOVIDs on banknotes. For this purpose, we propose a tailored feature descriptor which is robust against several expected sources of inaccuracy but still specific enough for the given task.

  • Tuesday 19. May 2015, 13:00

Title: Text Detection and Recognition in Natural Scene Images
Speaker: Michael Opitz
Location: ICG Seminar Room
Abstract: Text recognition in natural scene images is an application for computer vision tasks such as licence plate recognition, automated translation of street signs, help for visually impaired people, and image retrieval. In this work an end-to-end text recognition system is presented, which uses an AdaBoost ensemble with a modified Local Ternary Pattern (LTP) and Maximally Stable Extremal Regions (MSER) for text detection and deep Convolutional Neural Networks (CNNs) for recognition. The end-to-end system detects and recognizes approximately horizontal text with latin font in unconstrained environments. Experiments show that the system presented outperforms state-of-the-art methods on the ICDAR 2005 dataset in the text detection (F-Score: 74.2%), dictionary-driven cropped-word recognition (F-Score: 87.1%) and dictionary-driven end-to-end recognition (F-Score: 72.6%) tasks.

  • Tuesday 12. May 2015, 13:00

Title: Clustering of Static-Adaptive Correspondences for Deformable Object Tracking
Speaker: Georg Nebehay, AIT
Location: ICG Seminar Room
Abstract: We propose a novel method for establishing correspondences on deformable objects for single-target object tracking. The key ingredient is a dissimilarity measure between correspondences that takes into account their geometric compatibility, allowing us to separate inlier correspondences from outliers. We employ both static correspondences from the initial appearance of the object as well as adaptive correspondences from the previous frame to address the stability-plasticity dilemma. The geometric dissimilarity measure enables us to also disambiguate keypoints that are difficult to match. Based on these ideas we build a keypoint-based tracker that outputs rotated bounding boxes. We demonstrate in a rigorous empirical analysis that this tracker outperforms the state of the art on a dataset of 77 sequences.

  • Tuesday 05. May 2015, 13:00

Title: Separation of Arteries and Veins in Pulmonary CT Images
Speaker: Christian Payer
Location: ICG Seminar Room
Abstract: Automated computer-aided analysis of lung vessels has shown to yield promising results for non-invasive diagnosis of lung diseases. In order to detect vascular changes which affect arteries and veins differently, an algorithm capable of identifying these two compartments is needed. We propose a fully automatic algorithm that separates arteries and veins in thoracic computed tomography (CT) images based on two integer programs. The first extracts multiple distinct subtrees inside a graph of vessel paths. The second labels each vessel tree as either artery or vein by maximizing both the contact surface in their generalized Voronoi diagram, and a measure based on their closeness to accompanying bronchi. We evaluate the performance of our automatic algorithm on 10 manual segmentations of arterial and venous trees from patients with and without pulmonary vascular disease, achieving an average voxel based overlap of 94.1% (range: 85.0% { 98.7%), outperforming a recent state-of-the-art interactive method. We show the possible clinical use of this artery-vein separation algorithm by quantifying the tortuosity of vessels, whereby the tortuosity of arteries can better distinguish between patients with and without pulmonary hypertension than the tortuosity of veins.

  • Tuesday 28. April 2015, 13:30

Title: Multiuser-SLAM
Speaker: Philipp Fleck
Location: ICG Seminar Room
Abstract: We present a method to combine multiple local SLAM maps into combined maps in a client-server system. The server takes care of all clients and tries to detect overlapping regions among keyframes committed by clients. The system supports different clients with different levels of complexity, such as a thin client, which is used for image acquisitions, or an autonomous SLAM client, which generates its own local map. If clients move, the combined map is refreshed to keep pace with the client’s local map. Beyond the combination of client maps, the server system can update clients to improve their local system using keyframes and poses. Allowing clients to operate in the same context, will serve as a base for future AR applications. In particular, multiple clients commit their keyframes and the server generates a per-client reconstruction, as well a combined map. Afterwards, the clients receive updates in form of new keyframes and poses.

  • Tuesday 28. April 2015, 13:00

Title: Video Frame Interpolation for Smooth Slow Motion Based on Optical Flow
Speaker: Andrej Reichmann
Location: ICG Seminar Room
Abstract: Slow motion is a widely used technique, to unveil and analyze details of movements, which are too fast to be accurately perceived by the human visual system. This deceleration effect is typically achieved with the overcranking technique, where a video is recorded at a high frame rate and played back at a slower speed. In this thesis, concepts for the generation of the slow motion effect, which do not require a high frame rate camera, are investigated. The motion compensated frame interpolation algorithm is a sophisticated basis for the computation of the slow motion effect. It inserts novel frames between every frame pair of the recorded video. These intermediate frames are computed by interpolating moving objects along their apparent motion trajectories. Therefore, the resulting slow motion sequence features smooth movements. In this work, the variational Huber-L1 optical flow algorithm is utilized, to estimate the motion information. It was observed, that in regions where the motion estimation is erroneous, interpolation errors are very likely to become visible. These errors especially occur at very large displacements, small moving structures, occlusions and disocclusions, which all represent weak spots of the variational optical flow model. To improve these deficiencies, a two step artifact removal strategy was developed, which addresses the correction of the optical flow errors, to reduce the visible artifacts. The optical flow result was enhanced, by combining the variational motion estimation model with a patch-based approach. Based on this enhancement, the wrongly interpolated objects are relocated at the proper position, and inpainted with the proposed depth-order preserving algorithm. As shown in the evaluation, the slow motion algorithm yields promising results for a wide range of different scenes and movements. Furthermore, the artifact removal strategy is able to significantly reduce the amount of interpolation errors.

  • Tuesday 21. April 2015, 13:00

Title: Automatic Generation of Surrogate Terminals for Shape Grammar Derivation
Speaker: David Mandl
Location: ICG Seminar Room
Abstract: Master thesis presentation.

  • Friday 10. April 2015, 10:45

Title: Efficient multi-view correspondence for unordered structure-from-motion
Speaker: Prof. Konrad Schindler (ETHZ)
Location: lecture room i11, Inffeldgasse 16b, basement
Abstract: When using unordered image sets, e.g. community photo collections, for structure-from-motion and 3D vision, the main bottleneck is to establish image correspondence. In the talk, I will give an overview over different strategies to efficiently find and match overlapping images in large, unordered data sets. I will then present two recent contributions from my group to this body of work. One the one hand, we have shown that it is possible to predict from an interest point descriptor whether the point will be part of a successful match, and filter the points accordingly. On the other hand, we show that multi-view matching can be cast as an extreme case of image indexing, so that pairwise matching is avoided altogether.

  • Tuesday 17. March 2015, 13:00

Title: Distractor-Aware Model-Free Tracking
Speaker: Horst Possegger
Location: ICG Seminar Room
Abstract: Model-free online object tracking is a highly competitive research area. Over the last decade, the research focus has shifted to trackers based on well engineered features (e.g. HOG-based), correlation filters, and complex color features (e.g. color attributes). Considering the results of recent benchmark evaluations, such trackers typically outperform standard color representations by a large margin. In particular, color histogram based trackers often tend to drift towards regions which exhibit similar appearance compared to the object of interest. In this talk, we will present a color histogram based tracking approach that is able to overcome this drifting problem. In particular, we introduce a discriminative object model which allows to identify and suppress potentially distracting regions in advance. Evaluations on recent benchmark challenges demonstrate the favorable performance of our color based tracking approach. Furthermore, we will present a real-world application which uses our approach to generate thousands of videos daily.

  • Tuesday 10. March 2015, 13:00

Title: Hand Gesture Recognition and Particle Visualization in the Pharmaceutical Industry
Speaker: David Almasi
Location: ICG Seminar Room
Abstract: I present a self-developed real time application which recognises and categorises human hand gestures using a conventional Kinect sensor. It makes machines capable for recognising and tracking the movements of users beside low computational costs. Thereby new opportunities are opened in the field of human-machine interaction. The recognised movements are assorted in a manner to resemble those ones used by touch-screen devices ensuring swift learning capabilities. Covered gestures are pressing (clicking), zooming, swapping, recognition of simple shapes formed by hand movements and differentiation between opened and closed hands.

  • Tuesday 03. March 2015, 13:00

Title: Nameplate Detection and Classification
Speaker: Karlheinz Wohlmuth
Location: ICG Seminar Room
Abstract: MA Thesis presentation

  • Tuesday 03. March 2015, 13:00

Title: A Bilevel Sparse Coding Approach for Super Resolution
Speaker: Peter Innerhofer
Location: ICG Seminar Room
Abstract: MA thesis presentation

  • Tuesday 03. March 2015, 13:00

Title: Convex Framework for 2D & 3D Image Segmentation Using Shape Constraints
Speaker: Kerstin Hammernik
Location: ICG Seminar Room
Abstract: Image segmentation is one among many difficult tasks in computer vision. Due to limitations of image modalities that lead to noise or weak boundaries, object occlusion or intensity inhomogeneities, it is beneficial to incorporate shape prior knowledge to obtain more robust segmentation results. In this thesis, we explore different kinds of shape constraints for interactive medical image segmentation embedded in a convex variational framework and show how we can decrease user interaction significantly. We introduce two types of shape constraints. First, we show how we can describe shapes globally by means of star prior and moment constraints. While the first constraint ensures one-connected star convex objects, the latter constraint is responsible that first-order moments such as volume or center of gravity are fulfilled. Combining these constraints results in a powerful tool which can be applied to a variety of applications in 2D and 3D. We show the application to sinus floor augmentation segmentation and compare the results to an expert’s segmentation. The second type of constraints that we consider in this thesis are model specific shape constraints. While simple objects can be described by means of global shape constraints, more complex objects need a specific model description which requires prior knowledge from training data. Vertebrae segmentation is suitable to evaluate model specific shape constraints in this framework due to the complex substructures of vertebrae. A publicly available database allows for shape model estimation and provides experts’ segmentations for quantitative evaluation. We achieve promising results on this database which are comparable to literature. All proposed shape constraints can be specified by a single point or an ellipsoid. To interact with volumetric data, we provide a user-friendly Graphical User Interface (GUI). We exploit the high parallelization potential of our variational framework and implement the algorithms on the Graphics Processing Unit (GPU) using NVIDIA CUDA.

  • Tuesday 24. February 2015, 13:00

Title: 3D Scene Flow Estimation with a Piecewise Rigid Scene Model
Speaker: Christoph Vogel
Location: ICG Seminar Room
Abstract: The joint extraction of geometry and motion from image data is known as 'scene flow' estimation in the literature. Compared to monocular motion estimation, scene flow has the advantage to produce geometrically meaningful 3D motion estimates. We show that the mutual interaction of motion and stereo data can be leveraged to achieve significantly better results for both tasks. To that end we propose a novel representation which models the scene as a collection of rigidly moving planes into which the input images can be segmented. This piecewise rigid scene model is significantly more parsimonious than conventional pixel-based representations, yet retains the ability to represent real-world scenes with independent object motion. Our 3D representation enables us to define suitable scene priors and incorporate occlusion reasoning into the energy formulation. Assuming the rigid motion to persist approximately over time, additionally enables us to incorporate multiple frames into the inference. In our model each view holds its own representation, which is encouraged to be consistent across all other viewpoints and frames in a temporal window. We show that such a view-consistent multi-frame scheme significantly improves accuracy, especially in the presence of occlusions, and increases robustness against adverse imaging conditions.

  • Tuesday 03. February 2015, 13:00

Title: Continuous Hyper-parameter Learning for Support Vector Machines
Speaker: Teresa Klatzer
Location: ICG Seminar Room
Abstract: In this paper, we address the problem of determining optimal hyper-parameters for support vector machines (SVMs). The standard way for solving the model selection problem is to use grid search. Grid search constitutes an exhaustive search over a pre-defined discretized set of possible parameter values and evaluating the cross-validation error until the best is found. We developed a bi-level optimization approach to solve the model selection problem for linear and kernel SVMs, including the extension to learn several kernel parameters. Using this method, we can overcome the discretization of the parameter space using continuous optimization, and the complexity of the method only increases linearly with the number of parameters (instead of exponentially using grid search). In experiments, we determine optimal hyper-parameters based on different smooth estimates of the cross-validation error and find that only very few iterations of bi-level optimization yield good classification rates.

  • Tuesday 27. January 2015, 13:00

Title: EuRoC Explained - Challenges, Solutions and Possibilities
Speaker: Thomas Holzmann and colleagues
Location: ICG Seminar Room
Abstract: In this talk, we present EuRoC, the European Robotics Challenges. We describe the general motivation and objectives of the challenges and explain in more detail the challenge for Plant Servicing and Inspection using MAVs, where we participate. We discuss our solutions for the tasks of the first stage, which include vision-based localization and reconstruction, state estimation, control and navigation. With our solutions, we reached comparable results with leading teams in the challenge. Finally, we give an outlook of the next steps and future possibilities with EuRoC.

  • Tuesday 20. January 2015, 13:00

Title: Dense Reconstructability Prediction: Learning to Predict 3D Reconstruction Failures from 2D Images
Speaker: Christian Mostegel
Location: ICG Seminar Room
Abstract: In this talk we present our research progress in learning the limitations of 3D reconstruction approaches. We demonstrate how the problem of 3D reconstructability prediction can be reduced to a 2D labeling problem without the necessity of ground truth data. The proposed approach enables, for the first time, a dense prediction of the reconstruction reliability without actually having to perform the expensive reconstruction procedure itself. Our formulation makes it possible to learn for which kind of scene a specific 3D reconstruction approach is suited and thus allows the combination of several 3D reconstruction approaches on a common basis. Further potential applications include outlier rejection, speeding up and improving the reconstruction process as well as on-site reconstruction quality assessment.

  • Tuesday 13. January 2015, 13:00

Title: Distractor-Aware Model-Free Tracking
Speaker: Horst Possegger
Location: ICG Seminar Room
Abstract: Model-free online object tracking is a highly competitive research area. Over the last decade, the research focus has shifted to trackers based on well engineered features (e.g. HOG-based), correlation filters, and complex color features (e.g. color attributes). Considering the results of recent benchmark evaluations, such trackers typically outperform standard color representations by a large margin. In particular, color histogram based trackers often tend to drift towards regions which exhibit similar appearance compared to the object of interest. In this talk, we will present a color histogram based tracking approach that is able to overcome this drifting problem. In particular, we introduce a discriminative object model which allows to identify and suppress potentially distracting regions in advance. Evaluations on recent benchmark challenges demonstrate the favorable performance of our color based tracking approach. Furthermore, we will present a real-world application which uses our approach to generate thousands of videos daily.

  • Tuesday 16. December 2014, 13:00

Title: Automated End-to-End Workflow for Precise and Geo-accurate Reconstructions using Fiducial Markers
Speaker: Markus Rumpler
Location: ICG Seminar Room
Abstract: Photogrammetric computer vision systems have been well established in many scientific and commercial fields during the last decades. Recent developments in image-based 3D reconstruction systems in conjunction with the availability of affordable high quality digital consumer grade cameras have resulted in an easy way of creating visually appealing 3D models. However, many of these methods require manual steps in the processing chain and for many photogrammetric applications such as mapping, recurrent topographic surveys or architectural and archaeological 3D documentations, high accuracy in a geo-coordinate system is required which often cannot be guaranteed. Hence, in this paper we present and advocate a fully automated end-to-end workflow for precise and geo-accurate 3D reconstructions using fiducial markers. We integrate an automatic camera calibration and georeferencing method into our image-based reconstruction pipeline based on binary-coded fiducial markers as artificial, individually identifiable landmarks in the scene. Additionally, we facilitate the use of these markers in conjunction with known ground control points (GCP) in the bundle adjustment, and use an online feedback method that allows assessment of the final reconstruction quality in terms of image overlap, ground sampling distance (GSD) and completeness, and thus provides flexibility to adopt the image acquisition strategy already during image recording. An extensive set of experiments is presented which demonstrate the accuracy benefits to obtain a highly accurate and geographically aligned reconstruction with an absolute point position uncertainty of about 1.5 times the ground sampling distance.

  • Tuesday 09. December 2014, 13:00

Title: Projected Methods for Monotone Variational Inequality
Speaker: Yura Malitsky
Location: ICG Seminar Room
Abstract: Monotone variational inequality is a powerful and important tool for studying of many problems that arise from optimization, equilibrium problems, PDE, and control theory. Projected methods are conceptually simple methods for the solution of a monotone variational inequality. We will review the development of the projected methods and their recent advances. Finally, we will obtain a quite interesting scheme for composite minimization.

  • Tuesday 02. December 2014, 13:00

Title: Improving Sparse 3D Models for Man-Made Environments Using Line-Based 3D Reconstruction
Speaker: Manuel Hofer
Location: ICG Seminar Room
Abstract: Traditional Structure-from-Motion (SfM) approaches work well for richly textured scenes with a high number of distinctive feature points. Since man-made environments often contain textureless objects, the resulting point cloud suffers from a low density in corresponding scene parts. The missing 3D information heavily affects all kinds of subsequent post-processing tasks (e.g. meshing), and significantly decreases the visual appearance of the resulting 3D model. We propose a novel 3D reconstruction approach, which uses the output of conventional SfM pipelines to generate additional complementary 3D information, by exploiting line segments. We use appearance-less epipolar guided line matching to create a potentially large set of 3D line hypotheses, which are then verified using a global graph clustering procedure. We show that our proposed method outperforms the current state-of-the-art in terms of runtime and accuracy, as well as visual appearance of the resulting reconstructions.

  • Tuesday 18. November 2014, 13:00

Title: Fast Object Tracker on a Smart Camera
Speaker: Gernot Loibner
Location: ICG Seminar Room
Abstract: This diploma thesis addresses the problem of tracking various objects in real time on a smart camera. As starting point an existing tracking by detection based algorithm is used which uses histogram of oriented gradient (HOG) features for object detection and Lukas Kanade Tomasi (KLT) point tracking for motion estimation to combine those detections into trajectories. As the current implementation does not include distinct object information it likely mixes up objects passing nearby when tracking, leading to wrong object trajectories. To overcome this drawback the idea of including an additional verification step into the tracking process came up where object identity information can be included. With respect to real time capability, feature descriptors are reviewed and evaluated on two datasets. The most promising approaches in terms of quality (i.e., DCT, LBP, BRIEF) are included into the current algorithm. The extended tracking algorithm is evaluated using the CLEAR MOT metric against the baseline method. In addition, we compare to results available in the literature on two datasets with different image quality. The results show a significant drop of object identity switches whereas the overall performance of the algorithm doesn't improve in a satisfying manner.

  • Tuesday 11. November 2014, 13:00

Title: Automatic segmentation of the glottis from laryngeal high-speed videos using 3D geodesic active contours
Speaker: Fabian Schenk
Location: ICG Seminar Room
Abstract: Diagnosis and classification of voice and speech disorders have become important research topics in recent years and laryngeal high-speed videos have emerged as a state of the art method for the investigation of vocal fold vibrations. The vast amount of data produced in one recording makes a manual assessment impossible and an automated approach is required. We present a novel, fully automatic segmentation involving rigid motion compensation, salient region detection and 3D geodesic active contours.

  • Thursday 06. November 2014, 16:30

Title: Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
Speaker: Bogdan Savchynskyy
Location: ICG Seminar Room (E3.04)
Abstract: We consider energy minimization for undirected graphical models, also known as the MAP-inference problem for Markov random fields. Although combinatorial methods, which return a provably optimal integral solution of the problem, made a significant progress in the past decade, they are still typically unable to cope with large-scale datasets. On the other hand, large scale datasets are often defined on sparse graphs and convex relaxation methods, such as linear programming relaxations then provide good approximations to integral solutions. We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem. Based on the information obtained from the solution of the convex relaxation, our method confines application of the combinatorial solver to a small fraction of the initial graphical model, which allows to optimally solve much larger problems. We demonstrate the efficacy of our approach on a computer vision energy minimization benchmark.

  • Tuesday 28. October 2014, 16:00

Title: Mobile 3D Sensing at DotProduct - Applications and Future
Speaker: Rafael Spring
Location: ICG Seminar Room
Abstract: 3D cameras are making their way into phones and tablets. Similar to traditional mobile sensors such as 2D cameras, GPS and IMUs miniaturization and mass production have allowed them to go from external accessory to integrated and cheap. At the same time mobile computational power has exploded and is now approaching console-level performance in the current generation of SoCs. In this talk I will describe what all this means for computer vision, how it will affect consumer and professional markets on a high-level, what will be the killer-applications and how DotProduct technology helps enable them. I will also briefly talk about some upcoming trends in 3D sensing and will show some live demos of our products. BIO: Rafael Spring is an ex-Googler and computer vision engineer. He was part of the Google Visual Search Team in Santa Monica from 2009 to 2011, after his project "Enkin" (an Augmented Reality navigation system for Android) was acquired by Google in 2008. During his time at Google he developed real-time computer vision algorithms (2D) and contributed to Google Goggles,Google Glass and the early project Tango. He is now founder and CTO of DotProduct and wants to make realtime 3D perception an everyday capability of mobile devices.

  • Tuesday 14. October 2014, 13:00

Title: MAVMAP – A lightweight structure-from-motion system
Speaker: Friedrich Fraundorfer
Location: ICG Seminar Room
Abstract: MAVMAP is a structure-from-motion system in the spirit of Bundler and VisualSFM. However, it contains several features that make it distinct from them. In my talk I will give you some details about the software, it's capabilities and usage and show you results of it's successful application. It will also do a little bit of comparison to Bundler and VisualSFM.

  • Tuesday 07. October 2014, 13:00

Title: Bi-level Optimization for Support Vector Machines
Speaker: Teresa Klatzer
Location: ICG Seminar Room
Abstract: This thesis deals with an efficient approach for learning the optimal hyper-parameters for Support Vector Machines (SVMs). The common method to determine hyper-parameters is grid search. Grid search typically involves the definition of a discretized ”grid” of possible parameter values with a certain resolution and a search for the values that result in the minimal validation error of the learned model. A major limitation of grid search is that the search space grows exponentially in the parameters which makes the approach only practical for determining very few hyper-parameters. Additionally, grid search operates on discrete parameter values which leads to suboptimal solutions. In this thesis we develop an approach to use bi-level optimization for learning the optimal hyper-parameters and solve both major shortcomings of grid search in an efficient and elegant way. Bi-level learning is an optimization method where one optimization problem has another optimization problem as its constraint. The goal of the bi-level program is to find optimal hyper-parameters such that the validation error (the higher level objective) is minimized, while the optimal training problem is solved for the underlying SVM (the lower level objective). We use Lagrange multipliers to solve the bi-level problem and formulate the solution for several variants of the SVM (linear, kernel, multiple kernel). We can show that, using this method, the model selection problem (i.e. selection of hyper-parameters) can be solved also for a large number of hyper-parameters. The bilevel approach exploits the continuity of the hyper-parameters which allows for better solutions than with grid search. In the experiments, we investigate different properties of the bi-level approach and try to give insights into the advantages of this method. We find that highly parametrized kernel SVMs perform best compared to simpler models which is a clear advantage of bi-level optimization against grid search for model selection.

  • Tuesday 16. September 2014, 13:00

Title: Design and Development of a Modular Widget Toolkit
Speaker: Stefan Kohl
Location: ICG Seminar Room
Abstract: The purpose of this work is to design and implement a GUI toolkit for the application framework "Murl Engine", a 3D and multimedia engine that focuses on cross-platform application development for mobile devices and desktops. A GUI toolkit provides common widgets and utilities for developers to implement graphical user interfaces in applications. Most of the available GUI toolkits are based on desktop environments and do not run in 3D context. They further require certain operating systems or third-party libraries. Therefore, a solution is required that is only based on the Murl Engine itself and that can be integrated into its scene graph structure. The first chapter introduces the engine and states the requirements for the toolkit. An evaluation of existing GUI toolkits is given in the second chapter. The results will be used to deduct a concept for the toolkit to implement, which will be introduced in the third chapter. The main aspects of implementing a GUI toolkit (which traditionally lies in the 2D domain) in a scene graph oriented 3D framework is the topic of chapter four. Finally, chapter five demonstrates example applications written with the toolkit, before chapter six concludes this thesis. The realization of this work has shown that the capabilities of the Murl Engine and its scene graph system provides great effort in implementing a state-of-the-art 2D GUI toolkit, with some limitations though. Another challenge was to bridge the gap between the event-based paradigm of common toolkits with tick-based polling used in 3D engines. The result covers most common features evaluated in other toolkit and provides a strong foundation for future extensions.

  • Tuesday 16. September 2014, 13:00

Title: Visual Navigation using Quadrators in the Computer Vision Group - UPM - Madrid
Speaker: Jesús Pestana Puerta
Location: ICG Seminar Room
Abstract: I am a Robotics Engineer specialized in the field of Autonomous Unmanned Aerial Systems and a PhD candidate at the Computer Vision Group (CVG or Vision4UAV of the Polytechnic University of Madrid. In the talk, I will introduce the CVG-UPM and present my past PhD work. The CVG is a University research group at the "Centro Automática y Robótica (C.A.R.)" from the Universidad Politecnica Madrid with background in the Computer Vision and Automatic Control fields and with a large experience in industrial oriented applications, who started early research in UAV back in the 90s. The CVG focus is to provide Unmanned Systems (U.S.) with autonomy by exploting the most powerful sensor. i.e. vision. Most of my work has been on Micro Aerial Vehicles (MAVs) and Visual Based Navigation. I have experience utilizing the following visual based localization techniques: ground optical flow, object tracking and visual markers; though I have tried also PTAM (tum_ardrone ROS package) and Stereo Vision (LIBELAS). My work in robotics have been characterized by several out-of-the-lab demonstrations, specially participating in MAV competitions, where we have achieved good results, namely: CEA2012 (Spain), IMAV2012, IMAV2013 and IARC2014.

  • Tuesday 09. September 2014, 13:00

Title: Alternating Decision/Regression Forests For Zero-Shot Learning
Speaker: Matthias Freiberger
Location: ICG Seminar Room
Abstract: Recognition is the most challenging task in computer vision. Even simple su- pervised recognition systems need big databases of labeled training data. Since the labeling of training data is tedious and expensive, as a solution, the frame- work of Zero-Shot Learning has recently been proposed. Zero-Shot Learning allows to introduce new classes at test time without a single shot of train- ing data. Nevertheless, the proposed classifiers are very expensive to train in terms of memory and CPU time. Also due to an attribute sampling process, the available information is not fully used. In this work we evaluate the po- tential of Alternating Decision Forests as a classifier stage of the Zero-Shot Learning framework. Moreover we investigate the impact of additional infor- mation gained by estimating attributes utilizing an Alternating Regression For- est. While our proposed regression method exhibits insufficient performance, we show that Alternating Decision Forests are well suited as attribute stage classifiers. Alternating Decision Forests perform only slightly worse to the originally proposed Support Vector Machines, but are much cheaper in terms of training time and more flexible in configuration. Moreover we demonstrate that a large, diverse and well-labeled training set is crucial to the performance of Zero-Shot classifiers.

  • Tuesday 09. September 2014, 13:00

Title: Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks
Speaker: Thomas Ebner
Location: ICG Seminar Room
Abstract: Bone age estimation (BAE) is an important procedure in forensic practice which recently has seen a shift in attention from X- ray to MRI based imaging. To automate BAE from MRI, localization of the joints between hand bones is a crucial first step, which is challeng- ing due to anatomical variations, different poses and repeating struc- tures within the hand. We propose a landmark localization algorithm using multiple random regression forests, first analyzing the shape of the hand from information of the whole image, thus implicitly modeling the global landmark configuration, followed by a refinement based on more local information to increase prediction accuracy. We are able to clearly outperform related approaches on our dataset of 60 T1-weighted MR im- ages, achieving a mean landmark localization error of 1.4±1.5mm, while having only 0.25% outliers with an error greater than 10mm.

  • Tuesday 09. September 2014, 13:00

Title: Vertebrae Segmentation in 3D CT Images based on a Variational Framework
Speaker: Kerstin Hammernik
Location: ICG Seminar Room
Abstract: Automatic segmentation of 3D vertebrae is a challenging task in medical imaging. In this paper, we introduce a total variation (TV) based framework that incorporates an a priori model, i.e., a vertebral mean shape, image intensity and edge information. The algorithm was evaluated using leave-one-out cross validation on a data set containing ten computed tomography scans and ground truth segmentations provided for the CSI MICCAI 2014 spine and vertebrae segmentation challenge. We achieve promising results in terms of the Dice Similarity Coefficient (DSC) of 0.93 ± 0.04 averaged over the whole data set.

  • Tuesday 19. August 2014, 13:00

Title: Real Time High Dynamic Range Video on the GPU
Speaker: Lorenz Kellerer
Location: ICG Seminar Room
Abstract: The goal of this work was to show that real-time HDR is possible for large resolutions. We implemented algorithms related to HDR content processing because this can greatly improve the image quality of a video under difficult lighting conditions. For practical use in live video capturing it was important for our algorithms to work in real-time. There- fore we decided to implement existing algorithms in Nvidia’s CUDA. After selecting and implementing multiple algorithms we extended an existing panoramic image processing pipeline. This pipeline is used to capture panoramic videos of the playing field of a foot- ball stadium located in Tromsø, Norway. We performed multiple tests on our extension. These consisted of detailed performance analysis and an evaluation of the visual pleasant- ness of the output in a small user study. We proved that real-time high resolution HDR Video is possible and that it improves image quality.

  • Tuesday 24. June 2014, 13:00

Title: Straight Skeleton
Speaker: Gernot Walzl
Location: ICG Seminar Room
Abstract: This presentation gives an overview about our work on the Straight Skeleton in 3-space. The Straight Skeleton is defined by an offsetting (shrinking) process where each facet of a given polyhedron moves inwards in a self-parallel manner. At the very first moment, each vertex needs to be split into vertices of degree 3. During the shrinking process, the polyhedron undergoes combinatorial and topological changes until it vanishes completely. By tracking the vertices and edges of the shrinking polyhedron, its Straight Skeleton is created. We show that the Straight Skeleton of a polyhedron is not unique and we prove that there is always a solution. Our implementation provides visualizations to show the results.

  • Tuesday 24. June 2014, 13:00

Title: Embeddings for Random Ferns Classification
Speaker: Markus Oberweger
Location: ICG Seminar Room
Abstract: Efficient multi-class machine learning methods are a key component in many computer vision applications. In this area the Random Forest classifier is considered state-of-the-art as it provides an efficient way to learn an inherently multi-class classifier, that naturally handles high dimensional, multi-modal data. Recent research focused on optimizing this classifier for different applications, by learning appropriate node split criteria. In this work we present a different scheme for acquiring these node splits. By selecting feature subsets and using an ensemble of weak classifiers, we propose different linear subspace and ordinal embeddings to derive flat classifiers for leaf assignment, similar to the popular Random Ferns classifier. Therefore we use a generic framework into which we integrate all of our proposed embeddings. We evaluate our classifiers on several machine learning benchmark datasets, as well as on well-known computer vision datasets. We show, that our classifiers can outperform conventional Random Ferns and even Random Forest without significantly increasing computational costs. Further, we show the applicability of our classifier for the task of planar object tracking, as well as 3D interest point recognition.

  • Friday 20. June 2014, 13:00

Title: Maximum Persistency in Energy Minimization
Speaker: A. Shekhovtsov
Location: ICG Seminar Room
Abstract: For an NP-hard discrete minimization problem (known as energy minimization in computer vision) a new method is proposed to determine a part of the optimal solution in polynomial time. Also the found part may be empty, we guarantee it is the largest one that can be identified within a class of sufficient conditions that includes several major existing techniques as a special cases.

  • Tuesday 03. June 2014, 13:00

Title: Interactive Semantic Segmentation on Aerial Images
Speaker: Patrick Knöbelreiter
Location: ICG Seminar Room
Abstract: Extracting semantic information out of images is one of the most challenging problems in computer vision. The goal is to identify all foreground and background objects visible on an image simultaneously. Every pixel on the image gets a logical class label assigned, such that a pixel accurate segmentation results, which is referred to as a semantic segmentation. Deriving a semantic segmentation of arbitrary input images allows a computer not just to see the images with a camera, but also to understand what the content of an image actually is. This allows to automate a lot of things. However, viewpoint variations, occlusions and different scales make semantic segmentation a very complex task. In this thesis, a semantic segmentation of aerial images should be computed using an interactive approach. This requires the classifiers to be very fast at test time, such that the user gets immediate feedback after the classifier has been updated. Random forests and random ferns are classifiers fulfilling this property and have therefore been used in this thesis for the classification task. It is shown how random forests and random ferns can be used online, such that new training data can be incorporated at any time. To keep the interaction necessary as little as possible, a concept called active user guidance has been developed. This concept allows the user to update the classifier with those samples, which will have the greatest impact on performance. With the application it is possible to semantically segment complete aerial projects in 2D as well as in textured 3D. Projects in 3D allow to incorporate additional features like pixel synchronous surface normals for example, which are highly discriminative and therefore a powerful information source for semantic segmentation.

  • Tuesday 20. May 2014, 13:00

Title: Geometric Abstraction for Noisy Image-Based 3D Reconstructions
Speaker: Thomas Holzmann
Location: ICG Seminar Room
Abstract: With state-of-the-art reconstruction methods it is possible to create scene reconstructions resulting in a point cloud representation consisting of millions of points. As such a large amount of data is not processable for many applications, an abstracted representation is needed. However, creating geometrically abstracted models from image-based scene reconstructions is challenging due to noise and irregularities in the underlying reconstructed model. Many state-of-the-art approaches focus on extracting geometric structures from laser scan data which usually contain less noise. Others approximate surfaces using priors like geometric primitives or other parametric representation methods without introducing different levels of detail. In our work, we present a geometric modeling method for noisy, image-based reconstructions dominated by planar horizontal and orthogonal vertical structures. At dominant horizontal structures, we partition the scene into horizontal slices. As a whole slice contains similar vertical scene properties, we create a binary inside/outside labeling represented by a floor plan for each slice by solving an energy minimization problem. Consecutively, we create an irregular discretization of the volume according to the individual floor plans and again label each cell as inside/outside by minimizing an energy function. By adjusting the smoothness parameter, different levels of detail are introduced. In our experiments, we show results with varying regularization levels using synthetically generated and real-world data.

  • Tuesday 20. May 2014, 13:00

Title: Indoor Activity Detection and Recognition for Sport Games Analysis
Speaker: Georg Waltner
Location: ICG Seminar Room
Abstract: Activity recognition in sport is an attractive field for computer vision research. Game, player and team analysis are of great interest and research topics within this field emerge with the goal of automated analysis. The very specific underlying rules of sports can be used as prior knowledge for the recognition task and present a constrained environment for evaluation. This paper describes recognition of single player activities in sport with special emphasis on volleyball. Starting from a per-frame player-centered activity recognition, we incorporate geometry and contextual information via an activity context descriptor that collects information about all player’s activities over a certain timespan relative to the investigated player. The benefit of this context information on single player activity recognition is evaluated on our new real-life dataset presenting a total amount of almost 36k annotated frames containing 7 activity classes within 6 videos of professional volleyball games. Our incorporation of the contextual information improves the average player-centered classification performance of 77.56% by up to 18.35% on specific classes, proving that spatio-temporal context is an important clue for activity recognition.

  • Tuesday 13. May 2014, 13:00

Title: Sketch Recognition - Bag-of-Visual-Words for Classifying Sketches of Generic Object Categories
Speaker: Michael Scheer
Location: ICG Seminar Room
Abstract: In this thesis we address the problem of recognizing hand-drawn sketches of versatile types of object categories. This has diverse application fields in computer vision as well as computer graphics. Sketch recognition can e.g. be integrated in search engines, where the query is an image instead of text. Further, the 3D modeling of a complex scene can e.g. be supported with this technique. However, the task is difficult, as the appearance of the sketch depends on the drawer and his/her drawing skills. Further, the imagination of what a typical sketch of a certain object looks like, varies a lot. Another challenge is the visual likeness of specific categories. Therefore, most previous works in this field constrained themselves to a specific sketch domain such as e.g. recognizing primitives or symbols. In this thesis we introduce a general sketch recognition algorithm without this constraint. We investigate the relation between conventional image and sketch recognition. Further, we discuss our assumption that well studied techniques of the image recognition domain can be successfully transferred into the sketch domain. In order to do so, we perform extensive evaluations for every step involved in a Bag-of-Visual-Words based image recognition pipeline. We show that it is indeed possible to apply techniques designed for the image domain to sketch recognition, however the individual steps have to be carefully tuned in order to obtain a high performance. Our sketch recognition algorithm is designed to be both accurate as well as efficient to be able to provide immediate recognition feedback. In experiments we show that our approach achieves state-of-the-art performance on a large-scale sketch dataset with 250 categories, predicting the correct category for 63% of the sketches. Further, we show an augmented reality application, where a 3D model of the category of an online drawn sketch is augmented in a live video stream.

  • Tuesday 22. April 2014, 13:00

Title: Hierarchic representation of 3D Surfaces
Speaker: Jochen Steiner
Location: ICG Seminar Room
Abstract: Digital models of 3D surfaces like a planet’s terrain or the surface of buildings are needed in many applications, especially in the field of remote sensing. Due to progress in methods of attaining this information and users increasing their need for higher detailed models, the complexity of 3D surface representations has risen as well as the amount of data, including the incorporation of various levels of detail. In addition, digital elevation models commonly used in remote sensing cannot further serve the users’ needs. Overhangs of rocky surfaces cannot be represented efficiently in DEMs. Large surface models like representations of whole tunnels can easily exceed the amount of several gigabytes. Surface models exhibiting different characteristics and resolution may be needed in the same scope and work. The results of different scanning or reconstruction technologies may require the proper combination of various resolutions. Close up photogrammetric reconstructions meet airborne laser scanner macro recon- structions. In our project, we address issues as regards reconstructing 3D surfaces concerning different fields of activity in remote sensing by providing a general method as to how to process, manage and store surface data with different levels of detail and resolution efficiently to cope with modern scanning tech- nologies used for terrain and object reconstructions in remote sensing, as well as the rising requirements in representation and processing those surface models.

  • Tuesday 15. April 2014, 13:00

Title: Efficient user interface for accurate registration of GIS data on mobile devices
Speaker: Saif ALSAIFI
Location: ICG Seminar Room
Abstract: New technologies such as Smartphones play a major role in the modern applications development. Combining those technologies with other powerful information technologies (IT) provides us with applications, that can optimize our work processes, making them more ecient and more reliable. Utility workers use paper or digital GIS plans to have foreknowledge about what is located underground. However, AR GIS technology could supply workers with same information the plans provide, in a more ecient and e ective way. It would then be possible to visualize the underground infrastructure as an interactive 3D model, get information or descriptions about di erent objects in the

  • Monday 09. September 2013, 13:00

Title: Alternating Decision/Regression Forests For Zero-Shot Learning
Speaker: Matthias Freiberger
Location: ICG Seminar Room
Abstract: Recognition is the most challenging task in computer vision. Even simple su- pervised recognition systems need big databases of labeled training data. Since the labeling of training data is tedious and expensive, as a solution, the frame- work of Zero-Shot Learning has recently been proposed. Zero-Shot Learning allows to introduce new classes at test time without a single shot of train- ing data. Nevertheless, the proposed classifiers are very expensive to train in terms of memory and CPU time. Also due to an attribute sampling process, the available information is not fully used. In this work we evaluate the po- tential of Alternating Decision Forests as a classifier stage of the Zero-Shot Learning framework. Moreover we investigate the impact of additional infor- mation gained by estimating attributes utilizing an Alternating Regression For- est. While our proposed regression method exhibits insufficient performance, we show that Alternating Decision Forests are well suited as attribute stage classifiers. Alternating Decision Forests perform only slightly worse to the originally proposed Support Vector Machines, but are much cheaper in terms of training time and more flexible in configuration. Moreover we demonstrate that a large, diverse and well-labeled training set is crucial to the performance of Zero-Shot classifiers.

  • Tuesday 01. April 2014, 13:00

Title: Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks
Speaker: Thomas Ebner
Location: ICG Seminar Room
Abstract: Bone age estimation (BAE) is an important procedure in forensic practice, previously mainly performed on 2D X-ray images, but MRI based BAE has gained importance due to its absence of radiation exposure. To reach the goal of automatic BAE from MRI, localization of the joints between hand bones is a crucial first step, which is challenging due to anatomical variations, different poses and repeating structures within the hand. We propose a landmark localization algorithm using multiple random regression forests, first analyzing the shape of the hand from information of the whole image, thus implicitly modeling the global landmark configuration, followed by a refinement based on more local information to increase prediction accuracy. We are able to clearly outperform related approaches on our dataset of 60 T1-weighted MR images, achieving a mean landmark localization error of 1.4±1.5mm, while having only 0.25% outliers with an error greater than 10mm.

  • Tuesday 01. April 2014, 13:00

Title: Fully Automatic Bone Age Estimation from Left Hand MR Images
Speaker: Darko Stern
Location: ICG Seminar Room
Abstract: There has recently been an increased demand in bone age estimation (BAE) of living individuals and human remains in legal medicine applications. A severe drawback of established BAE techniques based on X-ray images is radiation exposure, since many countries prohibit scanning involving ionizing radiation without diagnostic reasons. We propose a completely automated method for BAE based on volumetric hand MRI images. On our database of 56 male caucasian subjects between 13 and 19 years, we are able to estimate the subjects age with a mean difference of 0.85 ± 0.58 years compared to the chronological age, which is in line with radiologist results using established radiographic methods. We see this work as a promising first step towards a novel MRI based bone age estimation system, with the key benefits of lacking exposure to ionizing radiation and higher accuracy due to exploitation of volumetric data.

  • Tuesday 25. March 2014, 13:00

Title: Modeling Text Saliency in Human Visual Attention
Speaker: Michael Schwarz
Location: ICG Seminar Room
Abstract: Information derived from human visual attention is important for many applications which need to focus on important regions in visual scenes. Currently used computational models of human visual attention provide possibilities to predict these most important regions for scenarios, in which eye tracker studies are not feasible. Even though researches already have identified the importance of text regarding the human visual attention, modeling text saliency is still a largely omitted topic. We provide a large scale database containing a combination of natural scene images and scene text images as well as according eye move- ment data collected at an eye tracking user study with 15 participants. A computational attention model based on a state-of-the-art approach to train specific context from eye movement data is learned on this new database to train an attention model with top-down text context. We further included a high-level text saliency feature to the computational attention model to test its influence on the prediction performance. The proposed model reports good performance results on our database as well as compared to other state-of- the-art models.

  • Tuesday 18. February 2014, 13:00

Title: Piecewise Rigid Scene Flow
Speaker: Prof. Konrad Schindler, ETH Zürich
Location: ICG Seminar Room
Abstract: TBA

  • Tuesday 28. January 2014, 13:00

Title: Position estimation of small unmanned aerial vehicle (UAV) from a monocular camera by combining visual odometry (VO) and V-SLAM
Speaker: Hariom Dhungana
Location: ICG Seminar Room
Abstract: This work try to get drift free accurate position estimation from a monocular camera attached to the UAV. Image sequences of static scene taken from moving camera are the input of visual positioning system. Accuracy of position estimation increases by increasing the number of features. The update time and computational complexity of maintaining the coupled pose and scene covariance of the extended Kalman filter algorithm scales quadratically O(N2) with increasing the number of features. This in turn limits the number of features for real-time application so position accuracy is limited in V-SLAM. VO system based on two frame estimates of instantaneous relative motion can work in constant time, but will inevitably exhibit drift because of accumulation of small errors in the inter frame motion estimates. Bundle adjustment (BA) and loop closure are two techniques to reduce drift error in VO. The computational complexity of BA increases cubically O((N+lm)3) as the number of position estimation parameters increases. For this reason it is not appropriate in the small UAV. The loop closer needs large memory to save all matched features. To address these limitations we try to find better solution called VO combined with fractional V-SLAM. Two statements on proposed approach can be claimed based on obtained result. First, uncertainty on position estimation in this method is less than V-SLAM because of admitting more number of features. Second, the computational cost is less in this approach because of the task of maintaining the coupled pose and scene covariance is reduced by interleaved factor.

  • Tuesday 28. January 2014, 13:00

Title: Active Monocular Localization: Towards Autonomous Monocular Exploration for Multirotor MAVs
Speaker: Christian Mostegel
Location: ICG Seminar Room
Abstract: The main contribution of this paper is to bridge the gap between passive monocular SLAM and autonomous robotic systems. While passive monocular SLAM strives to reconstruct the scene and determine the current camera pose for any given camera motion, not every camera motion is equally suited for these tasks. In this work we propose methods to evaluate the quality of camera motions with respect to the generation of new useful map points and localization maintenance. In our experiments, we demonstrate the effectiveness of our measures using a low-cost quadrocopter. The proposed system only requires a single passive camera as exteroceptive sensor. Due to its explorative nature, the system achieves autonomous way-point navigation in challenging, unknown, GPS-denied environments.

  • Tuesday 21. January 2014, 13:00

Title: Efficient Hough Forests for Multi-Camera, Multi-Instance Object Detection and Tracking
Speaker: Georg Poier
Location: ICG Seminar Room
Abstract: Visual object tracking represents a crucial task for many computer vision applications. Moreover, continued research interest in recent years shows that there are still open issues for tracking algorithms. Especially, when tracking multiple instances of the same class, occlusions are likely and can often only be resolved by observing the scene from several viewpoints. The multiplied amount of data coming from the different viewpoints demands methods of low computational complexity. Hence, simple approaches, e.g., based on background subtraction, are regularly applied to locate the objects of interest. Nevertheless, due to their simplicity, these methods often suffer from severe problems like mistaken instances or ghost detections. To overcome these problems, stronger, classifier-based models can be used. However, transferring complex models straightforwardly to multi-camera systems is not applicable due to the runtime requirements. This obvious discrepancy between the need for stronger models and short response times is addressed in this work. Our goal is the utilization of a more complex model. This is permitted by a preliminary investigation of the efficiency of such a model, revealing that the computational cost can be drastically reduced without sacrificing detection accuracy. To this end, we focus on Random Forests as an object model classifier in a tracking-by-detection setup. Random Forests received strong interest recently, which is mainly based on the fact that they allow fast predictions – regardless of the dimensionality of the feature space. A powerful and versatile extension for object detection are the recently proposed Hough Forests. However, their application is hampered in cases where a huge amount of data has to be processed, since for their original formulation runtime is correlated to the number of training samples. In line with this, we thoroughly analyze the efficiency of Hough Forest based object detection in order to obtain a fast detector applicable to online learning during tracking. Runtime critical parts are identified and investigated, which reveals several insights yielding a drastic improvement of the overall runtime. Moreover, scalability is induced by removing the correlation between the runtime and the amount of training data. Most importantly, the achieved runtime reduction does not imply a significant loss in accuracy. In fact, the proposed method scores within a few percent of the baseline, while being one to two orders of magnitude faster. The gathered insights can be straightforwardly applied to tracking in a tracking-bydetection manner. To this end, we perform tracking from single as well as multiple cameras and test our method on several standard evaluation datasets. The qualitative and quantitative results show that the system is able to perform comparable or even better than state-of-the-art methods in terms of accuracy. Despite that, the runtime of our method is reduced significantly compared to the related work.

  • Tuesday 14. January 2014, 13:00

Title: ICCV 2013 Summary
Speaker: Christian Reinbacher et al.
Location: ICG Seminar Room
Abstract: The ICG attendees of this ICCV 2013 would like to summarize the conference and present a few of the most interesting papers. The talks itself have been recorded and can be viewed (in an uncut version) at Speakers: David Ferstl Martin Köstinger Christian Reinbacher Samuel Schulter Martin Urschler

  • Tuesday 10. December 2013, 13:00

Title: Automated Georeferencing and Guided Camera Orientation for High-Resolution Aerial Imagery
Speaker: Martin Öttl
Location: ICG Seminar Room
Abstract: This work describes the design of a state of the art Aerial Triangulation (AT) pipeline. The distinct stages are implemented with the goal of a fast processing speed by exploit- ing as many information as available, which is crucial for dealing with high resolution aerial images. Such prior information are camera poses obtained from Global Positioning System (GPS) and Inertial Navigation System (INS) measurements as well as rough scene approximations from Digital Elevation Models (DEMs). These terrain data are freely avail- able and cover nearly the entire earth. At the beginning of the pipeline Scale-Invariant Feature Transform (SIFT) features are extracted, then the view selection stage utilizes the afore mentioned prior information to decide, which images display the same part of the scene and thus should be matched. During the subsequent matching step the same information is applied to predict corresponding feature locations in an image given the location in another image. After the triangulation of the obtained feature correspondences the final bundle adjustment stage refines the 3D structure as well as the camera poses. Evaluation results on two different image sets are presented, where one set contains also oblige images.

  • Tuesday 03. December 2013, 13:00

Title: A comparison of first-order algorithms for machine learning
Speaker: Wei Yu
Location: ICG Seminar Room
Abstract: In this work, we demonstrate a comprehensive comparison of three first-order optimization algorithms for convex optimization problems in machine learning. We concentrate on several smooth and non-smooth machine learning problems with a loss function plus a regularizer. For example, grouped feature selection, multi-task learning are composed of a non-smooth loss function and a non-smooth regularizer. The overall experimental results show the superiority of using primal dual algorithms to solve a machine learning problem from the perspectives of the easiest to construct, running time and accuracy.

  • Tuesday 26. November 2013, 13:00

Title: Real-time Anaglyph 360° Panoramic Vision Using a High-speed Rotating Dynamic Stereo Vision Sensor
Speaker: Stephan Schraml
Location: ICG Seminar Room
Abstract: This work presents the proof of concept of anaglyph 360° panoramic imaging using a high-speed rotating dynamic stereo vision sensor at up to 10 revolutions per seconds. The system consists of (1) a pair of biologically-inspired dual-line dynamic vision sensor generating events at high temporal resolution with on-chip time stamping (1µs resolution), having a high dynamic range and the sparse visual coding of the information, (2) a high-speed mechanical device rotating at up to 10 rotations per sec (rps) on which the sensor is mounted and (3) a real-time software for the reconstruction of a pair of 360° panoramic views in a contrast map (events) as generated by the sensor. Within this work, first an analysis on the geometrical representation for stereo reconstructions is made. Afterwards, a theoretical analysis was performed in order to show the best depth accuracy possible using the geometrical and optical constraints. Finally several experiments were carried out to assess the anaglyph representation of the panoramic views for real-time 3D 360° imaging. The first results from analysis shows this new camera concept allows real-time 3D panoramic views at 10 pan/s accuracy even in the challenging light conditions (below 100 lux).

  • Tuesday 29. October 2013, 13:00

Title: Interactive Scene Segmentation using 2D/3D Correspondences
Speaker: Stefan Wakolbinger
Location: ICG Seminar Room
Abstract: Reconstructing a 3D scene based on a set of images from different vantage points is a fundamental problem in computer vision. However, in most of the methods proposed in literature the complete 3D scene is aimed to be reconstructed. In contrast, the goal of this work is to build a model of only one specific object of interest, which is selected by the user. The problem can therefore be described as a combination of 3D reconstruction and segmentation. The algorithms for an interactive tool are developed, in which the user is asked to mark both object and background regions by placing strokes in some of the images. Additionally, depth maps are computed for each view. The main contribution of this work is the combination of depth information with semantic from color segmentation. The probabilities from the color models are fused with the hypotheses stemming from the depth maps, which is done directly in object space. An optimization step is performed in order to compute the most probable object surface using a smoothness prior, formulated in terms of the total variation framework. Instead of segmenting the object in each of the views and fusing the results to create a 3D model, the optimization is performed directly in object space. By exploiting the massive parallelism of modern graphics processing units, interactive computation times can be achieved, even for high volumetric resolutions. This allows the user to almost immediately view the results and conveniently interact with the tool by placing additional strokes to refine the results. As shown in various experiments, objects with arbitrary topology can be reconstructed at a high level of detail, with the performance mainly depending on the quality of the user input.

  • Tuesday 29. October 2013, 12:30

Title: Narrow Spectral Color Imaging and a Reconfigurable Camera Add-on for Snapshot Imaging Applications
Speaker: Ramon Hegedüs, MPI Saarbrücken
Location: ICG Seminar Room
Abstract: An image acquisition method and output rendering technique is presented, which can be particularly useful when optical information is only available or of special interest within a narrow spectral range. Since the difference among images captured in such a spectral window can be extremely small, in order to get meaningful color images, visualization of the acquired data requires a novel approach. To this end a color mapping method was developed that allows for projecting the input data onto the entire display color gamut with a continuous and perceptually nearly uniform mapping, while ensuring an optimally high information content for human perception. The second part of the talk covers a newly developed non-permanent camera add-on that enables plenoptic imaging with standard DSLR cameras. Its optical design is based on a physical copying mechanism that multiplies a sensor image into a number of identical copies that still carry the plenoptic information of interest. A minor modification of the design also allows for aperture subsampling and, hence, light-field imaging. Using a prototypical setup it is shown that high dynamic range, multispectral, polarization and light-field imaging can be achieved in snapshot captures.

  • Tuesday 15. October 2013, 13:00

Title: Active Monocular Localization: Towards Autonomous Monocular Exploration for Quadrotor MAVs
Speaker: Christian Mostegel
Location: ICG Seminar Room
Abstract: In this thesis we approach the topic of monocular SLAM from an active perspective. The main contribution of this work is bridging the gap between passive monocular SLAM and autonomous robotic systems. For this purpose we propose three novel measures. The first measure is called ”localization quality” and allows the evaluation of the stability of the monocular localization at arbitrary virtual camera poses. This generic measure can be used with every monocular SLAM approach which is based on bundle adjustment. The purpose of the second measure, the ”point generation likelihood”, is to evaluate the chance of generating new map points from arbitrary virtual view points. It is based on our novel variable depth distribution (VDD), which provides a reasonable guess for the 3D position of yet unmapped 2D features without any prior knowledge of the scene. The third of our proposed measures allows for an evaluation of the navigational safety of holonomic aerial vehicles. We use these novel measures in a destination-based planning approach for multirotor MAVs which makes the resulting system capable of autonomous explorative navigation. In our experiments we demonstrate the effectiveness of our novel measures as well as the capabilities of the overall system. We achieve autonomous way-point navigation with a quadrotor MAV in challenging indoor environments. Furthermore, we demonstrate that even tasks like a full 360◦ turn in sparsely textured environments can be achieved through our explorative navigation approach. In all experiments our system was able to maintain the visual localization at all times.

  • Tuesday 15. October 2013, 13:00

Title: Statistical shape models
Speaker: Marc Steiner
Location: ICG Seminar Room
Abstract: Statistical shape models have gained an increasing interest in the Computer Vision community over the past few decades. Awareness of an objects feasible shape variations provides insight to structural features and has shown to improve image segmentation significantly. Further applications that benefit from shape knowledge include image based analysis, classification and tracking. An essential element of shape analysis is the choice of a proper shape representation. Implicit representations such as the signed distance function (SDF) advantageously provide independence of correspondence, parametrisation and topology. Another crucial aspect concerns the selection of a shape space. Several approaches proposed in the literature revert to the simplified assumption of a linear shape space. However, in general this assumption is invalid, resulting in poor modelling of complex variations such as bending. In order to tackle complex shape deformations, a recent trend is to consider shapes populating a non-linear space. The work at hand details a recently proposed manifold approach for non-linear shape modelling on SDFs. We provide in-depth comparison to linear models and show how the manifold model can be incorporated to image segmentation.

  • Tuesday 15. October 2013, 13:00

Title: Collision Avoidance for Unmanned Aerial Vehicles using Monocular Vision
Speaker: Gert Hutter
Location: ICG Seminar Room
Abstract: Unmanned Aerial Vehicles (UAVs) have become considerably popular over the last years due to cheap and powerful available hardware and a widespread field of application. An important step towards autonomous flight capabilities is to detect obstacles and to avoid potential collisions during flight. In this master thesis we present a novel system for collision avoidance on UAVs based on visual input from the monocular on-board camera. Besides the image stream we utilize data from the Inertial Measurement Unit (IMU) for short-term pose estimation of the UAV. Three-dimensional information is extracted with an efficient Structure-from-Motion (SfM) approach based on sparse feature matching. A meshing technique is then utilized to fill gaps between sparse points leading to a more complete reconstruction of the ambient structure. Subsequent reconstructions are integrated into a probabilistic occupancy map that models free, occupied, and unknown space. In case of close obstacles proper reactive actions are taken to avoid a collision. The system is evaluated in terms of accuracy and real-time capabilities and we illustrate results of fully-autonomous and semi-autonomous flights.

  • Tuesday 08. October 2013, 13:00

Title: Fast, Feature-based Segmentation of Multiple Moving Objects Captured by Moving Camera
Speaker: Gerhard Schoenfelder
Location: ICG Seminar Room
Abstract: A fast separation of multiple, moving objects (foregrounds) and static background is important for many applications like image/video categorization, video coding and surveil- lance. In the case of a static camera/background many good approaches like frame differences have been developed, but in case of a moving camera or dynamic background the problem becomes more difficult. We present a fast and robust short-term (uses only three frames) seg- mentation approach which detects multiple, moving objects in a video with moving camera. Our chosen algorithm applies a frame difference method with a global motion compensation based on the Helmholtz Tradeoff Estimator as preprocessing step which yields a static video sequence. After this, we combine two adjacent error frames and apply a threshold segmenta- tion to these frames. We evaluate the results of our chosen approach on the Berkeley Motion Segmentation Dataset which confirms that this short-term approach is a simple and effective way to detect moving foregrounds.

  • Tuesday 08. October 2013, 13:00

Title: Pose Tracking on Mobile Phones in an Outdoor 3D Scene
Speaker: Florian Strasser
Location: ICG Seminar Room
Abstract: Is it possible to use static objects in an outdoor environment as targets for pose tracking? What are the major problems when using a mobile phone camera? In order to answer these questions we implement a pose tracking algorithm. SIFT is to slow to use it unchanged for pose tracking, but there are binary descriptors which are faster to compute: ORB, BRISK and FREAK. For keypoint detection FAST corners and Good Features to Track(Shi/Tomasi) are utilized. In order to see which keypoint detector and descriptor combination performs best while pose tracking, we define three test scenarios in outdoor scenes and an associated image sequence taken with a mobile phone camera. We demonstrate the initial pose estimation performance as well as the tracking performance on a prior computed reconstruction of the outdoor scene. Our tests show that changing weather conditions and the time of day influence the number of point correspondences found by the tracker. Also the users motion, such as looking around or walking, cause problems while tracking. Especially walking, which is a realistic use case for an outdoor augmented reality task, leads to a problem with the mobile phones camera. Additionally we evaluate the runtime of the keypoint detector and descriptor combinations on several mobile phones. This work is useful for augmented reality projects that deal with outdoor environments as well as with binary descriptors on mobile phones.

  • Tuesday 08. October 2013, 13:00

Title: Sparse 3D Reconstruction as Preprocessing Step for Markerless AR on Mobile Platforms
Speaker: Andrej Reichmann
Location: ICG Seminar Room
Abstract: Realtime Augmented Reality applications need supplemental data, to perform stable pose tracking in markerless outdoor environments. We present an offline solution as preprocessing step, which provides retrievable 3D reference points of the scene, from a set of input images, taken by a smartphone. Our reconstruction algorithm applies Bundle Adjustment, to refine the reconstructed sparse point cloud and to estimate the intrinsic parameters of the uncalibrated camera. We tested our implementation with 12 different keypoint detector/descriptor sets, to find the most appropriate combination for the reconstruction task. Evaluations show, that our method is capable to compute metric reconstructions with about 3300 cloud points and a mean reprojection error of 0.61 pixel. Our tests cover scenes with single centered objects and larger scenes with multiple objects. We also define a metric, to evaluate the distribution of backprojected 3D points onto the scene images, to identify poorly reconstructed scene parts, which can be improved by adding further images.

  • Tuesday 08. October 2013, 11:00

Title: Scalable Visualization
Speaker: Prof. Jens Krüger
Location: ICG Seminar Room
Abstract: In a steadily growing number of scientific disciplines, numerical methods of increasing complexity are now part of the standard tool set of scientists. In the course of these computations, massive data are generated at an ever-growing pace, as evident by the modern push for "exascale computing." However, while computational capabilities continue to increase, the same human beings must analyze growing mountains of data to transform 'ideas' into 'insight'. This requires that the interface between computation and human ingenuity -- that is, visualization -- becomes more efficient. In this talk a number of current research concepts are presented, each of which targets a number of key areas in the 'visualization pipeline'. The aim is to develop generic, scalable methods usable by a wide range of applications.

  • Tuesday 24. September 2013, 13:00

Title: Multi-frame rate augmented reality
Speaker: Philipp Grasmug
Location: ICG Seminar Room
Abstract: In this work we present a method for improving the visual quality of an augmented reality system. By combining the characteristics of two different sensors, we increase the spatial resolution of a video stream using sub-pixel accurate image registration. By decoupling the rendering process of the augmented information from the displaying frequency of the system, we can augment the scene using computationally expensive rendering techniques. We utilize image-based rendering to overcome the resulting temporal artifacts. Moreover, we evaluated our methods by comparing the achieved quality/speed ratio with conventional augmented reality methods. Finally, we explain limitations and assumptions of our algorithm and discuss further work.

  • Tuesday 13. August 2013, 13:00

Title: Selfintroduction of Pedro Boechat
Speaker: Pedro Boechat
Location: ICG Seminar Room
Abstract: Pedro is a master's student from Brazil. In this short presentation he will be talking about his graduation project (distributed scenegraph on commodity computer clusters), his previous work experiences (a dozen of different places he has worked at - from television to oil company) and his latest endeavors as an indie game developer.

  • Wednesday 10. July 2013, 10:00

Title: Investigation and use of the Lytro camera
Speaker: Paul Aylward
Location: ICG Seminar Room
Abstract: This report sums up a three month investigation on the Lytro lightfield camera. This device is the first consumer camera which takes "light field images", that is images which contain the information of several photographs, focused at different depths. Such a device has several applications, from simple digital refocusing to depth mapping and non linear refocusing. Most of this work relies on Ren Ng’s thesis [1], which highlights two main methods to compute refocused photographs out of the light field image, which can be viewed as a four dimensional function. The first one, called the spatial approach, is basically a projection of that four dimensional function down to 2D, along sheared lines. The second one, called the frequency approach, consists in computing the light field’s four dimensional Fourier transform, then to slice it down to 2D, and compute the inverse 2D Fourier transform of the frequency map thus obtained. The report will first introduce light field imagery and give the necessary setting, then deal with those two approaches, after a study of the calibration problem. Two applications will finally be presented.

  • Tuesday 02. July 2013, 13:00

Title: Edge Distance Shadow Mapping
Speaker: Michael Kenzel
Location: ICG Seminar Room
Abstract: Shadow mapping is one of the most popular algorithms for rendering shadows in computer generated images. It offers distinctive advantages in terms of simplicity, generality and performance, while at the same time being a natural fit for the rasterization-based rendering systems commonly used today. But the technique is extremely vulnerable to artifacts caused by aliasing effects, which can greatly compromise the quality of the generated shadows. We propose a new algorithm that not only avoids the common aliasing artifacts and delivers pixel-perfect hard shadows at interactive frame rates, but can even take advantage of perspective aliasing for the generation of soft shadows.

  • Tuesday 18. June 2013, 13:00

Title: Multiple Model Fitting Revisited
Speaker: Joris Bayer
Location: ICG Seminar Room
Abstract: The task of robustly fitting multiple parametric models to a noisy set of observed data is a common necessity in computer vision. Applications range from geometric model fitting (e.g. plane fitting) through homography estimation to motion segmentation. This thesis explores existing model fitting approaches, almost all of which rely on the random sampling paradigm introduced by RANSAC in 1981. This principle solves multiple model fitting as an application-independent problem of generating candidate models from random subsets of data, from which the best models are selected based on a predefined quality measure. We mould existing approaches into a generic model fitting framework for Matlab, built from exchangeable components for sampling strategies, outlier removal, model estimation and model selection. From the multitude of possible model fitting pipelines that emerges from this framework, we evaluate a selection of algorithms on the problem of multiple line fitting in synthetic two-dimensional data, considering both the quality of fit and the correctness of the number of models returned by the algorithms. We focus on approaches that identify clusters of observations based on their likelihood of having emerged from the same structure. For this purpose, we incorporate recent developments in graph clustering such as the Graph Shift and Authority Shift algorithms. In addition to the 2D experiments, we conduct experiments on the Freiburg and NYU datasets of depth data recorded by Microsoft’s Kinect© sensor, employing the framework to the problem of plane fitting. With the second dataset, we use the included semantic labeling to extract reference planes (floors, walls, tables, etc.) as a quasi-ground truth, allowing for the evaluation of the detection precision/recall of our algorithms.

  • Tuesday 18. June 2013, 13:00

Title: Pulmonary Vessel Detection and Analysis from CT Images
Speaker: Michael Helmberger
Location: ICG Seminar Room
Abstract: Extraction, segmentation and analysis of pulmonary vessels from computed tomography (CT) images of the human chest is an important topic for a wide range of applications in medical image analysis. We present a pulmonary vessel extraction and segmentation algorithm which is fast, fully automatic and robust against noise. It uses a segmentation of the airway tree and a left and right lung labeled volume to restrict the response of an offset medialness vessel enhancement filter. We test our algorithm on phantom data as well as on the VESSEL12 challenge dataset. Our clinical focus is on the detection of pulmonary hypertension (PH), which is a chronic disorder of the pulmonary circulation, marked by an elevated mean pulmonary arterial pressure (mPAP). On a dataset containing 24 patients from a clinical pulmonary hypertension pilot study, we show that quantitative indices derived from the segmented pulmonary vessels correlate with the mPAP and are applicable to distinguish patients with and without PH.

  • Monday 17. June 2013, 09:30

Title: Game Theory in Computer Vision and Pattern recognition
Speaker: Marcello Pelillo University of Venice, Italy
Location: ICG Seminar Room
Abstract: The development of game theory in the early 1940’s by John von Neumann was a reaction against the then dominant view that problems in economic theory can be formulated using standard methods from optimization theory. Indeed, most real-world economic problems typically involve conflicting interactions among decision-making agents that cannot be adequately captured by a single (global) objective function, thereby requiring a different, more sophisticated treatment. Accordingly, the main point made by game theorists is to shift the emphasis from optimality criteria to equilibrium conditions. As it provides an abstract theoretically-founded framework to elegantly model complex scenarios, game theory has found a variety of applications not only in economics and, more generally, social sciences but also in different fields of engineering and information technologies. In particular, in the past there have been various attempts aimed at formulating problems in computer vision, pattern recognition and machine learning from a game-theoretic perspective and, with the recent development of algorithmic game theory, the interest in these communities around this topic is growing at a fast pace. The goal of this talk is to offer an introduction to the basic concepts of game theory and to provide an overview of recent work on the use of game-theoretic models in computer vision and pattern recognition problems. I shall assume no pre-existing knowledge of game theory by the audience, thereby making the course self-contained and understandable by a non-expert.

  • Friday 14. June 2013, 13:00

Title: Geodesic Forests for Learning Coupled Predictors
Speaker: Peter Kontschieder
Location: ICG Seminar Room
Abstract: Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state-of-the-art forest models and the conventional pairwise CRF.

  • Friday 14. June 2013, 13:00

Title: Probabilistic Range Image Integration for DSM and True-Orthophoto Generation
Speaker: Markus Rumpler
Location: ICG Seminar Room
Abstract: Typical photogrammetric processing pipelines for digital surface model (DSM) generation perform aerial triangulation, dense image matching and a fusion step to integrate multiple depth estimates into a consistent 2.5D surface model. The integration is strongly influenced by the quality of the individual depth estimates, which need to be handled robustly. We propose a probabilistically motivated 3D filtering scheme for range image integration. Our approach avoids a discrete voxel sampling, is memory efficient and can easily be parallelized. Neighborhood information given by a Delaunay triangulation can be exploited for photometric refinement of the fused DSMs before rendering true-orthophotos from the obtained models. We compare our range image fusion approach quantitatively on ground truth data by a comparison with standard median fusion. We show that our approach can handle a large amount of outliers very robustly and is able to produce improved DSMs and true-orthophotos in a qualitative comparison with current state-of-the-art commercial aerial image processing software.

  • Tuesday 11. June 2013, 13:00

Title: Diffusion Processes for Retrieval Revisited
Speaker: Michael Donoser
Location: ICG Seminar Room
Abstract: This talk addresses generic retrieval applications where, given an arbitrary query element, we want to identify the most similar elements within a potentially huge database. Simply returning the most similar instances, based on pairwise similarity measures, ignores the underlying data manifold, and we overcome this limitation by re-evaluating all similarities in the context of all other database elements. The most popular approach in this field are graph diffusion processes and the talk will first revisit the state-of-the-art (SOTA). Based on our insights on SOTA, we are able to derive a generic framework, where the related work represents specific instances of our formulation. Experiments demonstrate applicability of our diffusion framework for several retrieval tasks, e.g. achieving a 100% bullseye score on the popular MPEG-7 shape retrieval data set.

  • Tuesday 28. May 2013, 13:00

Title: Marie Curie Fellowship Project: Adolescent age estimation from magnetic resonance images
Speaker: Darko Stern
Location: ICG Seminar Room
Abstract: The ability to assign accurate age estimates to living individuals and human remains has become an important element of forensic practice, due to increasing demands of the legal system in criminal prosecutions, and, increasingly important, determination of refugee status of asylum seekers. Bone age estimation (BAE) from medical images of the dentition and skeleton provides means for objective and reliable age estimation. Established in forensic practice, BAE in two-dimensional (2D) x-ray images is a challenging task due to the projective nature of radiographic images and lack of reference standard for a particular population. Moreover, although exposure to ionizing radiation during x-ray scans is not likely to cause immediate harm, it does add to the total lifetime dose of radiation received by the individual. The use of ionizing radiation without medical or criminal indication is therefore not permitted in many countries, and either way leads to ethical questions. On the other hand, three dimensional (3D) MR imaging provides means for comprehensive measurements of bones and cartilage, without the use of ionizing radiation. This may lead to more reliable and more precise BAE compared to evaluations based on 2D radiographs. The primary aim of the proposed project is to design and implement a software tool for automated age estimation based on the hand bones, clavicle and wisdom teeth in 3D MR images. By using the developed software, we will 1) investigate all of the information currently used for forensic age estimation of the same individuals, and get an insight into developmental differences between the hand bones, clavicle and wisdom teeth descriptors; 2) determine statistically relevant reference values for central Europeans of ages between 13 and 26 years. The project is expected to have a high social and economic impact by providing a novel completely non-invasive and automated method, which could become a new ?gold standard? in age estimation of adolescents.

  • Friday 24. May 2013, 09:30

Title: Shedding Light on Light Fields
Speaker: Oliver Bimber
Location: ICG Seminar Room
Abstract: Images play an essential role in our life. Photography and television are technologies that influenced generations like not many other technologies did. Both would be unimaginable without images. Advanced imaging systems and image processing methods are today fundamental to many professions. Medical imaging is certainly a good example. And if nothing else, images are also the final outcome of every visualization algorithm. Digital images are two-dimensional matrices of pixels. Cameras are based on this notion: Even though 3D scene points emit varying light rays in different directions, the lens and the sensor of cameras integrate them to a single pixel. By doing this for all imaged scene points, we end up with nothing more than a 2D image -- having lost most of the scene information. Displays are based on this notion: Pixels of raster-displayed images emit (more or less) the same amount of light in all directions -- giving us nothing more than a 2D image. Visualization and image processing algorithms are based on this notion: They map complex (possibly multidimensional) data to 2D images and vice versa. What if the notion of images would change once and forever? What if instead of capturing, storing, processing and displaying only a single color per pixel, each pixel would consist of individual colors for each emitting direction? Images would no longer be two-dimensional matrices but four-dimensional ones (storing spatial information in two dimensions, and directional information in the other two dimensions). This is called a light field. Light fields have the potential to radically change everything that we relate to images -- from photography, over displays to image processing and analysis, and possibly even visualization. While first light-fields display prototypes have already been introduced in scientific communities and first light-field cameras are already commercially available, many unsolved challenges remain in the processing of light fields. While common digital images store mega-bytes of data, corresponding light fields might require gigabytes. While spatial consistency is a requirement for regular image processing, directional consistency has to be ensured in addition for light-field processing. In this talk, I will shed some light on light fields and light-field processing basics with applications to imaging and visualization. I invite the audience to think about what the impact for computer vision, image processing and analysis, or visualization could be if images evolve to light fields, raster display evolve to light-field displays, and digital cameras evolve to light-field cameras. Short CV: Oliver Bimber became head of the Institute of Computer Graphics at Johannes Kepler University Linz in October 2009. From 2003-2010 he served as a Junior Professor of Augmented Reality at the Media System Science Department of Bauhaus-University Weimar. He received a Ph.D. (2002) in Engineering from Darmstadt University of Technology, Germany, and a Habilitation degree (2007) in Computer Science (Informatik) at Munich University of Technology. From 2001 to 2002 Bimber worked as a senior researcher at the Fraunhofer Center for Research in Computer Graphics in Providence, RI/USA, and from 1998 to 2001 he was a scientist at the Fraunhofer Institute for Computer Graphics in Rostock, Germany. Bimber co-authored the book "Displays: Fundamentals and Applications" (2011) with Rolf R. Hainich and the book "Spatial Augmented Reality" (2005) with Ramesh Raskar (MIT). Since 2005 he serves on the editorial board of the IEEE Computer Magazine. The VIOSO GmbH was founded in his group in 2005. He and his students received several awards for their research and inventions, and have won scientific competitions, such as the ACM Siggraph Student Research Competition (1st place 2006 and 2008, 2nd place 2009 and 2011), and the ACM Student Research Competition Grand Final (2006) that was presented together with the Turing award.

  • Tuesday 21. May 2013, 13:00

Title: A Convex Approach for Image Hallucination
Speaker: Peter Innerhofer
Location: ICG Seminar Room
Abstract: In this paper we propose a global convex approach for image hallucination. Altering the idea of classical multi image super resolution (SU) systems to single image SU, we incorporate aligned images to hal- lucinate the output. Our work is based on the paper of Tappen et al.[14] where they use a non-convex model for image hallucination. In compar- ison we formulate a convex primal optimization problem and derive a fast converging primal-dual algorithm with a global optimal solution. We use a database with face images to incorporate high-frequency details to the high-resolution output. We show that we can achieve state-of-the-art results by using a convex approach.

  • 14.05.2013 / ICG / 13:00h

Title:Realtime Object Separation for Industrial Glass Sorting Systems (master thesis)
Speaker:Chistian Hartbauer
Abstract: Industrial glass recycling often includes optical sorting machines, which are used to sort a stream of glass cullet according to its color. A crucial problem for the performance of such systems are connected objects which lead to wrong results in the sorting process. The goal of this thesis is to develop a solution to this problem. We formulate this problem as a par- titioning problem according to the Ising model, for two partitions, or the Potts model, for k partitions. Furthermore we define approximations, such that both models are defined in a spatially continuous domain and can be solved with variational methods. In order to find global solutions we use a simple convex relaxation approach. An other main aspect of this work is, the requirement, that the algorithms have to be executed in real-time. Thus we provide a fast numerical approach, called the primal dual algorithm, which allows to efficiently find a global solution of the problem. To gain real-time performance we provide a detailed overview in efficient initializa- tions, convergence criteria and show how parts of the computations can be excluded earlier. At last we illustrate the great advantages of our models by comparing them to the well known watershed segmentation algorithm.

  • Tuesday 30. April 2013, 13:00

Title: Visual Recognition and Failure Detection for Power Line Insulators
Speaker: Markus Oberweger
Location: ICG Seminar Room
Abstract: The inspection of high voltage power lines is an important task in order to prevent failure of the transmission system. In this work, we present a novel approach to detect insulators in aerial images and to analyze possible faults automatically. Our detection algorithm is based on discriminative learning of local gradient-based descriptors and a subsequent voting scheme for localization. Further, we introduce an automatic extraction of the individual insulator caps and check them for failures by using a descriptor with elliptical spatial support. The proposed inspection tool also clusters images of identical insulators using background features. We demonstrate our approach on an evaluation set of over 400 real-world insulator images captured from a helicopter and evaluate our results with respect to a manually created ground truth. The performance of our insulator detector is comparable to other state-of-the-art object detectors and our insulator fault detection outperforms existing methods.

  • 19.03.2013 / ICG / 13:00h

Title:Visual Links to Hidden Content(master thesis)
Speaker:Thomas Geymayer
Abstract: Visual links are lines drawn on top of an existing visualization to create connections and guidance between related regions. With modern operating systems, information is often distributed across multiple applications. As screen space is limited and applications may overlap, regions containing important information are prone to being invisible to the user.

In this thesis we present two new visualization techniques that help users finding and exploring important information hidden somewhere on the desktop. Visual cues and interaction methods allow for a fast identification and navigation to such hidden content.

  • 05.03.2013 / ICG / 13:00h

Title:GPU-Accelerated Panoramic Mapping and Tracking (master thesis)
Speaker:Georg Reinisch
Abstract: Creating panoramic images in real-time is an expensive operation for mobile devices. Depending on the size of the camera image and the panoramic image the pixel-mapping is one of the most time consuming parts. This part is the main focus of this paper and will be discussed in detail. To speed things up and to allow the handling of larger images the pixel-mapping process is transferred from the Central Processing Unit (CPU) to the Graphics Processing Unit (GPU). The independence of pixels being projected into the panoramic image allows OpenGL shaders to do the mapping very efficiently. Different approaches of the pixel-mapping process are demonstrated and confronted with an existing solution. The application is implemented for Android phones and works in real-time on current generation devices.

Title:Semantic Segmentation with Patch Priors
Speaker:Christian Rauer
Abstract: The topic of semantic segmentation is an important part of many applications in computer vision. This master's thesis presents a method for the incorporation of prior knowledge on the patch level in the inference process of a standard semantic segmentation pipeline. The presented approach uses a simple yet efficient method for the learning of prior knowledge by building a histogram of the occurrences of all image patches of a certain size in a set of training images. The implemented semantic segmentation framework extends the state-of-the-art approach by iteratively modifying the output of a graph cuts framework and reapplying the graph cuts regularization to the updated intermediate results obtained in this way. The previously constructed patch prior databases are utilized to determine the likelihood of patches in the intermediate results under the learned patch prior. In this way impossible patch configurations can be completely ruled out, and image patches that only occur infrequently in the training images are assigned a low likelihood. As a result, incorrectly labeled areas that can otherwise not be eliminated by the state-of-the-art approach can be dealt with. This thesis starts out with a survey of related work on the topic of segmentation with a special emphasis on semantic segmentation. It also deals with some of the concepts that are used as basis for these works. This is followed by an introduction of the datasets used in the evaluation phase and a detailed description of the implemented method. A discussion of the conducted experiments and the results and conclusions drawn from them demonstrates the usability of the presented method for the task of semantic segmentation.

  • 26.02.2013 / Seminar room ICG / 13:00h

Title:Image-based Measurement of Relative Motions between Railway Vehicle Carbodies (master thesis)
Speaker:Robert Hoedl
Abstract: The test and validation process of connecting components between railway vehicles comprises the determination of the performed relative motions between two adjacent carbodies during operation. Existing measurement systems have the drawback of being extensive and time consuming regarding installation, measurement and analysis. This thesis is concerned with the feasibility study and prototype development of a robust and cost-efficient image-based measurement system, which is capable of tracking the relative motions between railway vehicle carbodies. First, the thesis defines the operational requirements of an image-based measurement system. Then a suitable marker and an accompanying target design as well as a robust tracking method are introduced. Further, two appropriate pose estimation algorithms are determined and chosen for evaluation. Moreover, an adequate measurement setup relating to the specified carbody motion model is developed and the corresponding prerequisites are described. At last, these considerations enable the definition an appropriate optical imaging system. The feasibility of the designed image-based measurement system is investigated by extensive experiments conducted on a laboratory scale and using a full-scale test rig. Detailed evaluation of the uncertainties is carried out which allows to derive important implications concerning the measurement setup. From the results, it is apparent that the proposed system meets the specified requirements and is capable of measuring the relative motions within the defined limits, but to a certain extent remains susceptible to inaccuracies in the measurement setup and changing lighting conditions. Recommendations for a specific pose estimation method and further enhancements to increase robustness are given. The suitability of the system is finally verified in the course of a test ride on board of a high-speed train. The proposed image-based measurement system contributes a novel, genuine alternative to conventional methods applied to the particular task of measuring the relative motions between two railway vehicle carbodies. It perfectly fulfils the technical and economic requirements.

  • 26.02.2013 / Seminar room ICG / 10:00h

Title:Component Analysis for Human Sensing
Speaker:Fernando Del la Torre
Abstract: Enabling computers to understand human behavior has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior. In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g., kernel principal component analysis, support vector machines, spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks. In particular, I will show how several extensions of CA methods outperform state-of-the-art algorithms in problems such as facial feature detection and tracking, temporal clustering of human behavior, early detection of activities, non-rigid matching, visual labeling, and robust classification. The talk will be adaptive, and I will discuss the topics of major interest to the audience. Biography: Fernando De la Torre received his B.Sc. degree in Telecommunications (1994), M.Sc. (1996), and Ph. D. (2002) degrees in Electronic Engineering from La Salle School of Engineering in Ramon Llull University, Barcelona, Spain. In 2003 he joined the Robotics Institute at Carnegie Mellon University , and since 2010 he has been a Research Associate Professor. Dr. De la Torre's research interests include computer vision and machine learning, in particular face analysis, optimization and component analysis methods, and its applications to human sensing. He is Associate Editor at IEEE PAMI and leads the Component Analysis Laboratory ( and the Human Sensing Laboratory (

  • 29.01.2013 / Seminar room ICG / 13:00h

Title:A Compositional Approach to Shape for Visual Recognition of Objects and Abnormalities
Speaker:Prof. Dr. Björn Ommer
Abstract: Shape is a natural, highly prominent characteristic of objects that human vision utilizes everyday. But despite its expressiveness, shape poses significant challenges for category-level object detection in cluttered scenes: Object form is an emergent property that cannot be perceived locally but becomes only available once the whole object has been detected and segregated from the background. Thus we address the detection of objects and the assembling of their shape simultaneously. We learn a dictionary of meaningful contours by contour co-activation and a joint, consistent placement of all contours in an image yields a robust shape-based detection of objects in a multiple instance learning framework.

The compositional grouping of object parts can be extended to the parsing of complete scenes and videos and it provides a feasible approach to abnormality detection. Therefore, video frames are parsed by establishing a set of hypotheses that jointly explain all the foreground while, at the same time, trying to find normal training samples that explain the hypotheses. Consequently, a direct detection of abnormalities can be avoided. This is crucial since the class of all irregular objects and behaviors is infinite and thus no (or by far not enough) training samples are available.

Time permitting I will also talk about recent extensions to shape matching and multiple instance learning.

  • 25.01.2013 / Seminar room ICG / 13:00h

Title:In defense of MAP-based MRF approaches for image restoration
Speaker:Yunjin Chen
Abstract: It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision. Recent years have seen the emergence of two main approaches for learning the parameters in MRFs: (1) probabilistic learning using sampling-based algorithms and (2) loss-specific training based on MAP estimation. After investigating existing training approaches, it turns out that the performance of the loss-specific training has been significantly underestimated in existing work. In this paper, we revisit this approach and use techniques from bi-level optimization to solve it. We show that we can get a substantial gain in the final performance of the models by solving the bi-level optimization problem with very high accuracy. As a result, our trained models are on par with highly specialized image denoising algorithms and clearly outperform probabilistically trained MRF models. The MAP-based models comes along with the additional advantage, that inference is extremely efficient. Our GPU-based implementation takes less than 1s to produce state-of-the-art image restoration results. Our findings suggest that MAP estimation should be still considered as one of the leading approaches in low-level vision.

  • 22.01.2013 / Seminar room ICG / 13:00h

Title:Evaluation of the Softshell GPU execution model for real-time rendering and image processing applications (Master Thesis)
Speaker:Philip Voglreiter
Abstract: The recent ascent of parallel programming languages, in combination with the wide-spread availability of powerful graphics hardware, inspires the development of new programming paradigms. In this work we evaluate Softshell, which currently belongs to the most advanced parallel programming models. We select real-world applications and adapt them for this Application Programming Interface (API). The evaluated algorithms arise from the General Purpose Computation on the Graphics Programming Unit (GPGPU) field. In GPGPU, affordable and simultaneously powerful GPUs act as massively parallel coprocessors in a consumer Personal Computer (PC). NVidia’s Compute Unified Device Architecture (CUDA) currently represents the most flexible GPGPU architecture available. Softshell extends this flexibility further by immediately setting up on top of CUDA and extending its capabilities via a three-tier scheduling model. Synthetic test cases for Softshell already show excellent performance. In this work, we benchmark Softshell’s capability to improve performance and flexibility of “Ray Casting With Advanced Illumination” and “Particle Based Volume Rendering”. These algorithms are very well suited due to their vastly different requirements in terms of dynamic scheduling strategies. Furthermore, we discuss “Dykstra’s projection algorithm” as an example where classical parallelization as well as Softshell are incapable of providing significant performance improvements over the linear counterpart.

Title: A Modern Graphical User Interface Framework for Air Traffic Control Information Systems (Master Thesis)
Speaker:Bernhard Roth
Abstract: Air traffic controllers are used to observe a vast amount of different systems with inconsistent user interfaces. In this thesis we present the design of a client server architecture to integrate these systems into one that provides a homogeneous graphical user interface. The primary goals of the framework are adaptation flexibility, rapid prototyping capabilities, to be able to involve controllers in early project phases and the simple application of user interface design principles to optimize situational awareness.

Instead of using conventional toolkits for desktop application development, the graphical user interface of the presented system is built upon QtQuick, a library to create arbitrary user experiences through a declarative language, without the need for constant compilation. In this work we discuss details on the technology's advantages and disadvantages and give reasons for our motivation to use it.

We explain the system's design, paired with additional implementation details and present several prototypes, created with it, to demonstrate its possibilities. These prototypes are evaluated in regard to project adaptation efforts and usability impressions of controllers from different sites in the world, where the presented system will be installed in the near future.

The presented framework delivers low adaptation times and flexible capabilities to apply user interface design metaphors, which makes it well suitable for the intended use. In this regard, QtQuick proved to be a solid basis for the system.

Keywords: Air Traffic Control, ATC Information System, Graphical User Interface, User Interface Design, Human Computer Interaction, Prototyping, UI Toolkits, QtQuick, QML, Automation, Systems Integration, Situational Awareness

  • 04.12.2012 / Seminar room ICG / 13:00h

Title:A system for editable AR environment
Speaker:Mashita Tomohiro
Abstract: In this project, we aim to construct an AR system which enables users to reconstruct and edit a real scene. The editing targets in this system include shape, lighting, reflectance, texture, and object's motion. For example, an interior design system using this AR system can provide functions of copying furnitures, lighting simulation, texture transfer, and so on. In this talk, I will introduce the outline of our project and our previous works for this AR system as follows: - 3D modeling: feature based method and texture-based method, - Interior design system using a tablet device, - Direct texture transfer interaction.

  • 06.11.2012 / Seminar room ICG / 13:00h

Title:Softshell: Dynamic Scheduling on GPUs(SIGGRAPH Asia test talk)
Speaker:Markus Steinberger
Abstract: In this paper we present Softshell, a novel execution model for devices composed of multiple processing cores operating in a single instruction, multiple data fashion, such as graphics processing units (GPUs). The Softshell model is intuitive and more flexible than the kernel-based adaption of the stream processing model, which is currently the dominant model for general purpose GPU computation. Using the Softshell model, algorithms with a relatively low local degree of parallelism can execute efficiently on massively parallel architectures. Softshell has the following distinct advantages: (1) work can be dynamically issued directly on the device, eliminating the need for synchronization with an external source, i.e., the CPU; (2) its three-tier dynamic scheduler supports arbitrary scheduling strategies, including dynamic priorities and real-time scheduling; and (3) the user can influence, pause, and cancel work already submitted for parallel execution. The Softshell processing model thus brings capabilities to GPU architectures that were previously only known from operating-system designs and reserved for CPU programming. As a proof of our claims, we present a publicly available implementation of the Softshell processing model realized on top of CUDA. The benchmarks of this implementation demonstrate that our processing model is easy to use and also performs substantially better than the state-of-the-art kernel-based processing model for problems that have been difficult to parallelize in the past.

  • 06.11.2012 / Seminar room ICG / 13:00h

Title: RoNect: Hand Mounted Depth Sensing Using a Commodity Gaming Sensor (ICPR test talk)
Speaker: Christian Reinbacher
Abstract: In this work, we investigate the applicability of the Kinect depth camera as a robot mounted measurement unit. In contrast to traditional head mounted robot sensors, Kinect is small, cheap and delivers robust depth measurements on a variety of scenes. In the course of applying it on a robot arm, we solve a number of problems: we reduce the sensor working distance to a few centimeters, replace the Laser projector unit by a focusable projector, and calibrate this sensor unit. We further exploit the motion capabilities of the robot arm to integrate multiple depth maps with 30 Hz in a volumetric fusion approach. We show how this method considerably improves completeness of the scanned models, even under severe reflections and difficult surface properties. We employ our approach in a classical bin picking setting, where the robot scans the object duringits approaching motion, and picks it afterwards.

  • 30.10.2012 / Seminar room ICG / 13:00h

Title: Data Warehouse F&T; (Master Thesis)
Speaker: Mario Johansson
Abstract: A product of the Oracle Company called Discoverer is used by the TU Graz for some years in terms of reporting. Since this product has become outdated, it is meaningful to start thinking of new technologies for creating the desired information. Advantages would be a better support and a handful of new features and opportunities. The Research and Technology House (R&T; House) is a key user of such analysis-referred inquiries. Since it is aimed that the old reporting tool will be abolished, the department needs a new solution for creating reports.

According to the two different approaches it is tried to find a solution, which is acceptable for the R&T; House, in order to ensure a further consistent and quality-safe report production. On the one hand a relational approach is implemented which resembles the original system. On the other hand it is tried to handle the specifications of the R&T; House with a more future- proof approach of a Data Warehouse.

Goal of this thesis is to implement the two different solutions and to evaluate them afterwards. According to the results there should be a recommendation for the R&T; House which system should replace the Oracle Discoverer.

Title:Interactive 4D Overview and Detail Visualization in Augmented Reality
Speaker:Stefanie Zollmann
Abstract: In this paper we present an approach for visualizing time-oriented data of dynamic scenes in an on-site AR view. Visualizations of time-oriented data have special challenges compared to the visualization of arbitrary virtual objects. Usually, the 4D data occludes a large part of the real scene. Additionally, the data sets from different points in time may occlude each other. Thus, it is important to design adequate visualization techniques that provide a comprehensible visualization. In this paper we introduce a visualization concept that uses overview and detail techniques to present 4D data in different detail levels. These levels provide at first an overview of the 4D scene, at second information about the 4D change of a single object and at third detailed information about object appearance and geometry for specific points in time. Combining the three levels of detail with interactive transitions such as magic lenses or distorted viewing techniques enables the user to understand the relationship between them. Finally we show how to apply this concept for construction site documentation and monitoring.

Title:Real-Time Photometric Registration from Arbitrary Geometry
Speaker:Lukas Gruber
Abstract: Visually coherent rendering for augmented reality is concerned with seamlessly blending the virtual world and the real world in real-time. One challenge in achieving this is the correct handling of lighting. We are interested in applying real-world light to virtual objects, and compute the interaction of light between virtual and real. This implies the measurement of the real-world lighting, also known as photometric registration. So far, photometric registration has mainly been done through capturing images with artificial light probes, such as mirror balls or planar markers, or by using high dynamic range cameras with fish-eye lenses. In this paper, we present a novel non-invasive system, using arbitrary scene geometry as a light probe for photometric registration, and a general AR rendering pipeline supporting real-time global illumination techniques. Based on state of the art real-time geometric reconstruction, we show how to robustly extract data for photometric registration to compute a realistic representation of the real-world diffuse lighting. Our approach estimates the light from observations of the reconstructed model and is based on spherical harmonics, enabling plausible illumination such as soft shadows, in a mixed virtual-real rendering pipeline.

  • 23.10.2012 / Seminar room ICG / 13:00h

Title:3D Morphable Models(Master Thesis)
Speaker:Christoph Gratl
Abstract: In photographs, the three-dimensional world is reduced to a two-dimensional image. In this master's thesis, we try to restore some of this lost information for the special case of human faces. Using the prior knowledge of human heads incorporated in a 3D Morphable Face Model, it is possible to recover the shape and texture of the displayed face as well as its pose and lighting in the photograph up to a certain extent. First we show the building process of this 3D Morphable Model. The base for the model is formed by a large set of unregistered human 3D head scans. With the Non-Rigid ICP algorithm, the headscans are brought into correspondence. The result is a high-dimensional Face Space with a lot of redundancy. On this Face Space a Principal Component Analysis is applied to reduce dimensionality and extract the more significant base vectors. As outcome we get a statistical three-dimensional face model that can describe human faces in an elegant and compact way by only a few parameters. To find the parameters of the 3D Morphable Face Model for a given face in a photograph, an Analysis-by-Synthesis algorithm is used. Starting from a reasonable point (e.g. the mean face), the model coefficients are adjusted incrementally. The randomized residual between input and ajdusted model as well as the derivative of the residual influence those adjustments. The algorithm stops when the projection of the 3D model matches the face on the input photograph close enough and returns the model and rendering parameters. The whole framework described in this master's thesis is evaluated both with regard to pose estimation and 3D face reconstruction on several image databases.

Title:Linking various image domains to enhance web-based mapping services
Speaker:Michael Kröpfl
Abstract: In this talk I will cover several components of my thesis.

The broad availability of modern web-based mapping services, as well as community photo collections (CPC) has led to a huge amount of geographically relevant image data. Mapping services usually provide aerial and "human scale" imagery taken by systematically "scanning" a certain region from the air or the ground with specialized sensors. CPC on the other hand provide unsystematically captured data provided by users with consumer devices such as cell phones, often augmented by meta data such as tags, descriptions, user identification or location.

In a previous talk at ICG, I showed that by providing links between images through image matching using a cloud based system, as well as analyzing the spatial distribution of CPC and their meta data, the underlying geography or geographically relevant features can be extracted automatically. Furthermore I presented a dynamic version of this system already propagated with a basic set of imagery, which allowed live user interaction from mobile-phones via photographs.

This talk will cover more details of the algorithms used for the dynamic version of the image matching system, and also describe the data sets and metrics used to evaluate and improve its performance. Additionally I will present a sample geospatial application (insurance claim documentation) making use of the above system, which was developed within 30 labor hours, including both the development of the client software as well as a cloud based web-service. Finally, I will describe a method to automatically align the point cloud generated by structure-from motion from a CPC image collection (Photosynth™) to an aerial image of the respective area.

  • 16.10.2012 / Seminar room ICG / 10:00h

Title:Continuous optimization methods on arbitrary graphs. Applications to inverse problems in imaging.
Speaker:Hugues Talbot, Universite Paris Est
Abstract: In the last decade, we have witnessed much progress in convex and combinatorial optimization methods, which have become fast enough and flexible enough to be applied to actual imaging problems. Novel applications have emerged as a result, such as compressive sensing for instance. The tools to perform the modeling and the optimization have also become more versatile. In this talk, I will present recent work in this area, allowing the use of recent continuous variational methods on arbitrary graphs. Applications range from segmentation to mesh denoising via deblurring and denoising.

  • 09.10.2012 / Seminar room ICG / 13:00h

Title:Wide-Area Scene Mapping for Mobile Visual Tracking
Speaker:Jonathan Ventura
Abstract: We propose a system for easily preparing arbitrary wide-area environments for subsequent real-time tracking with a handheld device. Our system evaluation shows that minimal user effort is required to initialize a camera tracking session in an unprepared environment. We combine panoramas captured using a handheld omnidirectional camera from several viewpoints to create a point cloud model. After the offline modeling step, live camera pose tracking is initialized by feature point matching, and continuously updated by aligning the point cloud model to the camera image. Given a reconstruction made with less than five minutes of video, we achieve below 25 cm translational error and 0.5 degrees rotational error for over 80% of images tested. In contrast to camera-based simultaneous localization and mapping (SLAM) systems, our methods are suitable for handheld use in large outdoor spaces.

  • 02.10.2012 / Seminar room ICG / 13:00h

Title: Exemplar-based Inpainting On Videos(Master Thesis)
Speaker:Manuel Hofer
Abstract: Exemplar-based inpainting is a common patch-based method for the purpose of object removal in natural images. In order to adapt this method for video sequences, correspon- dences between the frames have to be determined to enable time-consistent inpainting and tracking of the unwanted object. To conquer the issue of the increased amount of data to be processed, an optimization scheme is necessary as well. In our work, we present a local approach based on optical flow estimation, which extends the basic exemplar-based inpainting approach for video sequences under uncon- strained camera motions. Our method is able to perform time-consistent inpainting by using only two frames at a time and thus, making the frame-by-frame run-time inde- pendent from the total number of frames in the sequence. In order to achieve a higher performance, the crucial parts of the algorithm are evaluated in parallel on the GPU. An evaluation performed on various sequences, taken from motion pictures, shows our promising results.

  • 11.09.2012 / ICG / 13:00h

Title:A bilevel optimization approach for parameter learning in variational models
Speaker:Thomas Pock
Abstract: We consider the problem of parameter learning for variational image denoising models. The learning problem is formulated as a bilevel optimization problem, where the lower level problem is given by the variational model and the higher level problem is expressed by means of a loss function that penalizes errors between the solution of the lower level problem and the ground truth data. We consider a class of image denoising models incorporating p-norm based analysis priors using a fixed set of linear operators. We devise semi-smooth Newton methods to solve the resulting non-smooth bilevel optimization problems and show that the optimized image denoising models can achieve state-of-the-art performance.

Joint work with K.Kunisch, Uni Graz

  • 24.08.2012 / ICG / 13:00h

Title:Synergy-based Learning of Facial Identity
Speaker:Martin Köstinger, ICG, TU-GRAZ
Abstract: In this paper we address the problem that most face recognition approaches neglect that faces share strong visual similarities, which can be exploited when learning discriminative models. Hence, we propose to model face recognition as multi-task learning problem. This enables us to exploit both, shared common information and also individual characteristics of faces. In particular, we build on Mahalanobis metric learning, which has recently shown good performance for many computer vision problems. Our main contribution is twofold. First, we extend a recent efficient metric learning algorithm to multi-task learning. The resulting algorithm supports label-incompatible learning which allows us to tap the rather large pool of anonymously labeled face pairs also for face identification. Second, we show how to learn and combine person specific metrics for face identification improving the classification power. We demonstrate the method for different face recognition tasks where we are able to match or slightly outperform state-of-the-art multi-task learning approaches.

  • 21.08.2012 / ICG / 13:00h

Title:Image Labeling by Hierarchical Segment Support
Speaker:Michael Donoser, ICG, TU-GRAZ
Abstract: This talk introduces a method for using unsupervised, hierarchical image segmentation results as powerful spatial support to define priors for the categorical image labeling task. Our foundation is a unique segment hierarchy, that is directly obtained from contrast features, normally used to define the contrast-sensitive Potts model. We show how to effectively infer a solution by analyzing the label certainty of segments in the obtained hierarchy using a graph theoretical method denoted as Maximum Weight Independent Set (MWIS). The MWIS algorithm yields a labeling result that partitions the image into a set of non-overlapping segments with uniquely assigned labels. Segments may come from different layers of the hierarchy, i.e. the final segmentation result locally may have severely different granularity, which is important to obtain accurate labeling results. Experimental results show competitive labeling accuracy compared to related discrete, continuous and filtering approaches in important vision applications like pixel-wise class recognition and interactive segmentation.

  • 14.08.2012 / ICG / 13:00h

Title:Unsupervised segmentation of moving objects by long term video analysis
Speaker:Peter Ochs, University of Freiburg
Abstract: Motion is a strong cue for unsupervised object-level grouping. In this work, we demonstrate that motion will be exploited most effectively, if it is regarded over larger time windows. Opposed to classical two-frame optical flow, point trajectories that span hundreds of frames are less susceptible to short term variations that hinder separating different objects. As a positive side effect, the resulting groupings are temporally consistent over a whole video shot, a property that requires tedious post-processing in the vast majority of existing approaches. We suggest working with a paradigm that starts with semi-dense motion cues first and that fills up non-labeled areas afterwards based on color.

  • 16.07.2012 / ICG / 13:00h

Title:Visual Mapping of Unknown Space Targets for Relative Navigation and Inspection (Master thesis)
Speaker:Elias Müggler, ETH Zürich, CSAIL/MIT
Abstract: We describe an approach for visual mapping of unknown and noncooperative space targets. The generated maps can be used for relative navigation, inspection, and other tasks. A calibrated stereo camera is used as sensor. Two different types of maps were investigated: feature-based maps and point cloud models. A reliable way of feature matching is presented that takes the spatial position of the features into account. The trajectory is first optimized using a pose graph. Feature-based maps are further optimized using bundle adjustment. The proposed architecture is flexible and can be extended to multiple inspectors and online map generation. The approach was implemented on the SPHERES/VERTIGO hardware and results from ground testing of textured, single-colored, and specular targets are presented.

  • 29.06.2012 / ICG / 14:00h

Title:Video analysis tools towards unsupervised learning
Speaker:Prof. Dr. Thomas Brox, University of Freiburg
Abstract: There is strong indication that motion plays a major role in visual learning of infants. In contrast to our computer based systems, infants do not require bounding boxes of selected examples to learn what is a face, a person, or a dog. Motion can help decisively in solving the hard problem of object segmentation, which - in my view - is a grouping step needed for unsupervised learning. I will present a couple of video analysis tools that in the end allow for segmentation of moving objects. Based on optical flow we generate high quality point trajectories, which allow for a long term analysis of video shots. Defining pair-wise distances between these trajectories allows to cluster them and yields temporally consistent segmentations of moving objects. In contrast to multi-body factorization, points and even whole objects may appear or disappear during the shot. While these trajectory clusters are still sparse, a variational interpolation approach can turn them into dense segmentations.

  • 14.06.2012 / HSi12 / 13:00h

Title:6D-Vision: Cars Learn to See
Speaker:Dr. Uwe Franke (Daimler AG, Böblingen Germany)
Abstract: The performance of future driver assistance systems depends on precision, robustness and completeness of their environment perception. The urban scenario in particular poses high demands on the sensors, since dangerous situations have to be recognized quickly and with high confidence. The dream of a car perceiving its environment with human like performance in order to realize accident free driving can only be reached if the car has two eyes working in stereo – that is my firm belief.

The talk will present the state-of-the-art in space-time computer vision at Daimler Research. This covers real-time dense stereo analysis (running on a FPGA) as well as dense optical flow analysis (running on a GPU). Most known stereo systems concentrate on single image pairs. This prohibits the recognition of moving objects like pedestrians, if they are close to other dominant obstacles or partially hidden. A smart fusion of stereo vision and motion analysis is the key to overcome this deficiency. The 6D-Vision principle tracks points with depth known from stereo over two and more consecutive frames and fuses the spatial and temporal information using Kalman filters. Thus, 6D-Vision simultaneously estimates depth and 3D-motion for every tracked pixel. Using dense optical flow, it is possible to derive this information for every pixel of the image. This Dense6D approach clearly outperforms the results of scene flow.

The high-quality spatio-temporal information is successfully used to model the world and to detect and track moving obstacles from the moving car. Pedestrian recognition significantly benefits from the dense stereo and motion information. For oncoming vehicles, besides speed and acceleration even the turn-rate can be determined reliably.

Next generation cameras will have imagers with up to 3 Mio pixel. In order to handle the huge amount of 3 Million 3D-points per frame efficiently, the so called Stixel-World has been introduced as a medium level representation. It turns out that this layer significantly reduces the complexity of image understanding tasks in complex scenes.

Real-world experiments illustrate the high performance available in the experimental car – hopefully paving the way towards safer driving.

  • 12.06.2012 / ICG / 13:00h

Title:Interactive Vision Based Reconstruction of Statistical 3D Structure of Lithium-Ion-Electrodes
Speaker:Christian Mischitz
Abstract: In the field of lithium-ion-cells the investigation of the inner 3D microstructure of electrode materials is an actual research topic. It is proven that the electrical characteristics of these materials are depending on their composition and their microstructure. The manufacturers of the cells provide statistical data about the composition, but to obtain the microstructure of the materials expensive procedures are needed, which involve acquiring hundreds of consecutive microscope-images. This master's thesis provides an interactive segmentation tool which reconstructs a statistical identical 3D structure of lithium-ion-electrodes based on just one image. To realize this, the arbitrary shaped particles are approximated by touching ellipsoids. The obtained 3D structure is used as a base for accurate modeling of lithium-ion-cells. In this master's thesis the framework of ferns, originally proposed for keypoint recognition, is adapted and used for a classification task. The user provides seed points which cover the characteristics of the different materials; the system uses the ferns to obtain a classification; a subsequent segmentation based on the discrete Potts model is performed. An ellipse fitting algorithm, tuned for the special purpose of separating particle shapes which are common in lithium-ion-electrodes, approximates the segmented areas with ellipses. Based on these, ellipsoids defined by their sizes, rotations and relative frequencies are reconstructed and arranged in a 3D volume. In the end, comprehensive evaluations of the adaptations of the fern framework and the reconstructing algorithm are given. In addition the performance of the proposed system is also evaluated on natural images and compared to other up-to-date segmentation frameworks.

  • 01.06.2012 / ICG / 13:00h

Title:Monocular 3D Estimation With Deformable Object Models
Speaker:Prof. Dr. Konrad Schindler, ETH Zürich
Abstract: Monocular 3D reconstruction is a geometrically ill-defined problem. Still, it can be accomplished when prior knowledge about the observed objects is available. We revisit ideas from the early days of computer vision, namely, 3D geometric representations of semantically defined object categories. These representations can recover detailed geometric object hypotheses, including the relative 3D positions of object parts. In combination with recent robust techniques for local shape description and inference, such representations can be applied to real-world images. We analyze this approach in detail, and demonstrate novel applications enabled by the geometric object class representation, such as fine-grained categorization of cars according to their 3D geometry, and ultra-wide baseline matching.

  • 23.05.2012 / ICG / 13:00h

Title:3D data collection for 3D City Modelling at IGN: methods, applications and trends
Speaker:Prof. Nicolas Paparoditis (Directeur de Laboratoire IGN - Laboratoire MATIS)
Abstract: The MATIS research laboratory of IGN, the french national mapping agency, has been working in the field of automated data collection for 3D city modeling for more than twenty years. After a brief introduction on IGN and its missions, we will present on the one hand the 3D imaging acquisition systems we have developped to collect multi-source infrastructure data and on the other hand the photogrammetric computer vision and image analysis techniques which have been developped for information extraction on very large data sets for integrated building, road, and vegetation modelling. We will present our achievements, the remaining locks and some trends in the field of geographic information systems for new applications (autonomous navigation, multimédia, etc.) of city models.

  • 22.05.2012 / ICG / 13:00h

Title:Sparse Coding For Video-Based Analysis
Speaker:Max Stricker
Abstract: An important step for each object recognition task is to find an adequate representation of the information contained in an image. This representation can be generated through analyzing the object?s appearance or geometry, and depending on the task needs to fulfill certain requirements. Recently, research has evidenced that sparse feature descriptors perform well for many object recognition applications. Therefore Sparse Coding, a sparse feature descriptor representing images as sparse linear combinations of a few basis elements, has been investigated in this work in order to emphasize its strengths and drawbacks for different computer vision tasks. The first goal has been to theoretically analyze Sparse Coding and its main parameters. Various experiments have been conducted to reason about different parameter settings and their implications. The evaluation of these experiments made it possible to derive the importance of various parameters for the feature descriptor?s expressiveness. The second part of this master?s thesis addresses the applicability of Sparse Coding for the analysis of video sequences based on two exemplary application scenarios. The first application deals with the problem of needing many labeled images when training classifiers for object recognition. This work proposes a general method, based on Sparse Coding, to automatically label individual frames within natural video sequences. These labels are generated by analyzing the slow driving force within the sparse feature descriptor. Experiments on two datasets have been performed to evaluate the proposed system. Based on the results of the analysis of Sparse Coding, a framework for detecting unusual events and irregularities in videos has been developed. This unsupervised task has been chosen to identify and assess the strengths and weaknesses of Sparse Coding compared to other state-of-the-art feature descriptors. This work emphasized benefits when using Sparse Coding for different computer vision tasks dealing with video-sequences.

  • 15.05.2012 / ICG / 13:00h

Title:Hand Gesture Recognition using Time-of-Flight Imaging
Speaker:David Ferstl
Abstract: Time-of-Flight (ToF) and other depth sensing IR-based cameras are becoming more and more affordable in consumer electronics. Using these cameras reveals completely new methods how we can interact with multimedia devices. In this project we are working on a real-time hand gesture recognition system to work freely in a 3D desktop environment. The use of ToF sensors not only improves the segmentation, but also delivers additional depth information at high frame rates.

  • 08.05.2012 / ICG / 15:00h

Title:Computational 3D Photography: Extracting Shape, Motion and Appearance from Images
Speaker:Prof. Marc Pollefeys (ETH Zürich)
Abstract: TBA

  • 08.05.2012 / ICG / 13:00h

Title:Geo–Referenced 3D Reconstruction: Fusing Public Geographic Data and Aerial Imagery (Video Session)
Speaker:Michael Maurer
Abstract: We present an image–based 3D reconstruction pipeline for acquiring geo– referenced semi–dense 3D models. Multiple overlapping images captured from a micro aerial vehicle platform provide a highly redundant source for multi-view reconstructions. Publicly available geo–spatial information sources are used to obtain an approximation to a digital surface model (DSM). Models obtained by the semi–dense reconstruction are automatically aligned to the DSM to allow the integration of highly detailed models into the original DSM and to provide geographic context.

  • 24.04.2012 / ICG / 13:00h

Title:Computer-Aided Classification of Butterflies using Shape and Appearance (Master Thesis)
Speaker:Philipp Paier
Abstract: The Natural Museum of History in Vienna is currently working on assembling digital image databases of their large insect collections. The purpose of this thesis is to provide and evaluate a system that makes use of such databases to ease the work of entomologists during the process of identifying new specimens. The work is divided into two smaller projects each concerning itself with the identification of butterflies.

For the first project microscope scans of male genital organs are used to compare owl moths. Relational indices - measurements, that are currently determined manually by entomologists - motivate a semi-automated shape based classification approach. Structural measurement descriptors or Shape Context are used to describe certain parts of the genital organ, and a candidate list of species is calculated according to the distance between the description of a query and a groundtruth specimen.

The second project concerns itself with the classification of butterflies based on the inner appearance of their wings. Color histograms and SIFT descriptors are used with a variety of region of interest detectors to extract image features. The use of spatial pyramids is proposed to incorporate spatial information and classification is done using vocabulary trees.

Evaluation for both projects is based on two different datasets respectively and recognition rates up to 90% are achieved, depending on the specific task.

  • 27.03.2012 / ICG / 13:00h

Title:Depth Coded Shape from Focus
Speaker:Martin Lenz
Abstract: We present a novel shape from focus method for high-speed shape reconstruction in optical microscopy. While the traditional shape from focus approach heavily depends on the presence of surface texture, and requires a considerable amount of measurement time, our method is able to perform 3D reconstruction from only two images. Our method relies on the rapid projection of a binary pattern sequence, while an object is continuously moved through the camera focus range and a single image is continuously exposed. Deconvolution of the integral image allows a direct decoding of the binary pattern and its associated depth. Experiments on a synthetic dataset and on real scenes show that a depth map can be reconstructed at only 3% of memory costs and a fraction of the computational effort compared with traditional shape from focus.

  • 27.03.2012 / ICG / 13:00h

Title:Video Registration and Depth Propagation for Changing 3D Environments (Master Thesis)
Speaker:Stephan Berer
Abstract: As a consequence of the enormous progress made in the last few years, modern structure- from-motion (SfM) techniques now allow the reconstruction of large-scale 3D models from hundreds of images in just a few hours. Input images, which are required for the recon- struction process, are easily accessible through Internet photo collection platforms such as Flickr, Google Images and Panoramio. Additionally, recently evolved 3D reconstruction databases, such as Photosynth and Photofly, contain a vast amount of freely available reconstructed 3D scenes. Furthermore, registration and localisation of images from dig- ital cameras are well explored; that being said, using video cameras rather than images provides additional temporal information. The known geometry together with the video information can be exploited to render realistic 3D videos from arbitrary viewpoints. Such 3D videos bring about additional knowledge to the already existent visual information. Occlusion handling, foreground and background extraction and change detection are only a few of the applications that are simplified with this additional depth information. This master’s thesis proposes a novel approach, combining scene models and 2D videos to render such 3D videos. Additionally, a self-calibration of the camera is implicitly incorpo- rated into the registration process; hence, one can easily calibrate any camera by taking a couple of images or a video from an existing 3D model. For this purpose, a standard pinhole camera with a radial distortion model is used. The radial distortion model is estimated with a non-linear optimiser through minimising the re-projection error. Finally, a comprehensive evaluation of the applied techniques is given on the basis of real self-captured datasets.

  • 26.03.2012 / ICG / 13:00h

Title:Human-Inspired Visual Servoing for Automatic Take-Off, Hovering and Landing of MAVs (Master Thesis)
Speaker:Mario Kartusic
Abstract: In this thesis we present a human-inspired visual servoing approach for automatic take-off, hovering, and landing of Micro Aerial Vehicles (MAVs), suitable for indoor and outdoor applications. Our approach is based on a Position-Based Visual Servoing (PBVS) technique. Therefore, we use only a monocular camera looking in flight direction as an input sensor. Based on a state-of-the-art monocular Simultaneous Localization And Mapping (SLAM) approach, we estimate the position of the MAV in the environment. For the initialization of the map, we extend the algorithm by exploiting an artificial marker to obtain the correct scale. Additionally, we incorporate a fuzzy logic design in order to get a robust position control without the need of a mathematical model of the MAV. We show that this controller tolerates noisy pose estimates without incorporating additional sensor measurements. In the experiments, we demonstrate that our approach achieves a performance comparable to several state-of-the-art approaches for hovering and during trajectory flights, but without the need for sensor fusion or specific mathematical models. Furthermore, we demonstrate that even low camera resolutions deliver a pose estimate which can be used for autonomous MAV navigation tasks. Finally, we discuss how to detect system failures and how to react in indoor as well as outdoor environments. Our approach is useful for a variety of MAV applications, including the autonomous inspection of power pylons where take-off, hovering, and landing of MAVs is essential.

  • 16.03.2012 / ICG / 13:30h

Title:From Analysis to Communication: The Future of Visualization is Storytelling
Speaker:Robert Kosara, Associate Professor, Department of Computer Science, University of North Carolina at Charlotte. Currently on sabbatical with Tableau Software (
Abstract: Visualization and visual analytics are often considered to support three types of activity: exploration, analysis, and presentation. While the former two are covered well in the literature, there has been very little interest in the academic community so far in the latter. On the web, however, the vast majority of work is about presentation, often with very little or no analysis (or even real data). Recently, interest in presentation has been growing, in particular with regards to visual storytelling. I will survey some recent work and discuss where I think the field is heading.
Bio: Robert Kosara is Associate Professor of Computer Science at University of North Carolina, Charlotte, currently on sabbatical with Tableau Software in Seattle, WA. His academic work covers the visualization of categorical data, perception and cognition in visualization, and the development of more rigorous theoretical foundations for the field. Robert's interests span all of visual communication, however: photography, design, painting, as well as any kind of visual storytelling. He runs a blog on visualization and visual communication at

  • 15.03.2012 / ICG / 17:00h

Title:Data Analysis and Visualization for The Cancer Genome Atlas
Speaker:Nils Gehlenborg, Center for Biomedical Informatics, Harvard Medical School Cancer Program, Broad Institute (
Abstract: The Cancer Genome Atlas (TCGA) project is a large-scale effort to identify and catalogue genomic alterations in common cancer types. Hundreds of scientists at over a dozen sites in the United States and Canada participate in this project and are collecting and analyzing a wide range of genomic data types (e.g. measurements of gene expression, mutations, copy number alterations and DNA methylation) and clinical data for approximately 10,000 cancer patients representing over 20 cancer types. I will begin my talk with an overview of the TCGA project and the Firehose analysis pipeline developed to perform comprehensive and reproducible analyses of TCGA data sets. I will then focus on two projects aimed at making the results of these analyses accessible for interpretation and exploration by domain experts in cancer biology and medicine. First, I will discuss the development and deployment of Nozzle, a report generation toolkit for the Firehose analysis pipeline. Second, I will present StratomeX, an interactive visualization tool designed to support domain experts in characterizing tumor subtypes based on Firehose analysis results. I will conclude the talk with an outline of ongoing work that will give domain experts even more comprehensive access to Firehose results.

  • 12.03.2012 / ICG / 13:00h

Title:Self-similarity and points of interest
Speaker:Jasna Maver (visiting scientist)
Abstract: In this work, I present a new approach to interest point detection. Different types of features in images are detected by using a common computational concept. The total sum of squares computed on the intensity values of a local circular region is divided into three different components. These three components normalized by the total sum of squares represent three new saliency measures, namely, radial, tangential, and residual. The saliency measures are computed for regions with different radii and scale-spaces are build in this way. Local extrema in scale-space of each of the saliency measures are located. They represent features with complementary image properties: blob-like features, corner-like features, and highly textured points. Results obtained on image sets of different object classes and image sets under different types of photometric and geometric transformations show high robustness of the method to intra-class variations as well as to different photometric transformations and moderate geometric transformations and compare favourably with the results obtained by the leading interest point detectors from the literature. The proposed approach gives a rich set of highly distinctive local regions that can be used for object recognition, image reconstruction, and image matching.

  • 06.03.2012 / ICG / 13:00h

Title:Multiform Visualization of Heterogeneous Data Spaces
Speaker:Christian Partl
Abstract:Investigators from different domains are frequently confronted with large volumes of data originating from multiple sources that need to be analyzed simultaneously. Inhomogeneities within datasets thereby often obstruct the analysis process, but can also be the source potentially interesting relationships in the data. Most existing visualization techniques do not take such inhomogeneities into account and represent the data in a one-view-fits-all fashion, thus being of limited use in the course of an analysis. The proposed approach considers the inhomogeneous nature of datasets and the need for a differentiated visualization of their subsets. It is based on VisBricks, an interactive multiform visualization that is made up of basic building blocks called bricks. Each brick visualizes a homogeneous subset of a dataset in a way that suits the subsets' characteristics best. Visual linking of bricks is used to indicate relationships between different homogeneous subsets. VisBricks also provides drill-down features that allow a detailed examination of individual subsets. An important issue when dealing with several homogeneous subsets in a setup of multiple datasets is their configuration and management. For that purpose we developed the Data-View Integrator, which combines views and datasets in an abstract graph representation. It provides an overview of present datasets and their relationships, and can be used to configure and assign homogeneous subsets to VisBricks or other views for an in-depth analysis. The proposed visualizations techniques were evaluated in case studies by domain experts in the context of multiple genomic datasets used for the characterization of cancer subtypes. They were able to quickly reproduce known results from the literature and also gained new insights into the data.


  • 28.02.2012 / Seminar room CGV / 13:00h

Title:Realtime 3D-Reconstruction (Master Thesis)
Speaker:Gottfried Graber
Abstract:Reconstruction of 3D geometry from 2D images is one of the most fundamental challenges in computer vision. In the past decade, numerous algorithms have been developed to solve this problem in an offline fashion. Only recently, the availability of cheap processing power in the form of GPUs and appropriate parallel algorithms made it possible to tackle this problem in a novel way and present results to the user in realtime. The aim of this thesis is to create a system that is capable of interactively reconstructing dense geometry from a single moving camera. Building on a high quality realtime tracking system, depthmaps of the scene are computed by means of a dense multiview stereo algorithm. A volumetric representation of geometry is used, where the surface is given implicitly by the zero level-set of an underlying truncated signed distance function. Reconstruction is based on robust depthmap fusion using a total variation formulation. The resulting convex energy functional is solved globally optimal using a fast primal-dual algorithm.

The application developed in this thesis is able to reconstruct arbitrary geometry on-the-fly with minimal user interaction thanks to a fully automated pipeline. The dense geometry can serve as starting point for sophisticated AR applications, where pixel-accurate interaction between computer generated content and the real world is possible.

  • 21.02.2012 / ICG / 13:00h

Title:Web-based Augmented Reality
Speaker:Christoph Oberhofer
Abstract: Augmented Reality (AR) applications are usually built on top of dedicated visual tracking pipelines implemented in a high performance compiled programming language, such as C++, or executed inside optimized runtime-environments, like Adobe Flash. Such implementations tie applications to specific platforms and vendors, making it difficult to provide a single solution for multiple systems. Today's cross-platform AR solutions are mainly based upon proprietary web-technologies lacking of mobile device support or open standards.

Within this thesis, the development, implementation and evaluation of an AR tracking pipeline using natural features is presented. The whole pipeline, starting from camera access to final 3D real-time rendering, is solely based on standard web technologies including HTML5, JavaScript and WebGL. The novelty lies within the completely plugin-free manner of the solution, running in basically each modern web browser on the PC and even on mobile phones.

An extensive evaluation shows that real-time framerates are achieved on entry-level PCs whereas interactive experience is made feasible on high-end smartphones.

  • 17.02.2012 / ICG / 10:30h

Title:Fast Scalable Dense Reconstruction of the World from Photos and Videos
Speaker:Jan-Michael Frahm
Abstract: In recent years photo and video sharing web sites like Flickr and Youtube have become increasingly popular. Nowadays, every day millions of photos are uploaded. These photos survey the world. Given the scale of data we are facing significant challenges to process them within a short time frame given limited resources. In my talk I will present my work on the highly efficient organization and reconstruction of 3D models from city scale photo collections (millions of images per city) on a single PC in the span of a day as well as my work on the real-time scene reconstruction from video. The approaches address a variety of the current challenges to achieve a concurrent 3D model from these data. For reconstruction from photo collections these challenges are: selecting the data of interest from the noisy datasets, efficient robust camera motion estimation. Shared challenges of photo collection based 3D modeling and 3D reconstruction from video are: high performance stereo estimation from multiple views, as well as image based location recognition for topology detection. In the talk I will discuss the details of our appearance and geometry based image organization method, our efficient stereo technique for determining the scene depths from photo collection images will also be explained during the talk. It allows to perform the scene depth estimation with multiple frames per second from a large set of views with a considerable variation in appearance. Additionally, I will discuss some of the lessons learned for how to approach these large scale challenges in the future.

Jan-Michael Frahm is a Research Assistant Professor at University of North Carolina at Chapel Hill. He received his Ph.D in computer vision in 2005 from the Christian-Albrechts University of Kiel, Germany. His Diploma in Computer Science is from the University of Lübeck. Jan-Michael Frahm‘s research interests include a variety of computer vision problems. He has worked on structure from motion for single/multi-camera systems for static and dynamic scenes to create 3D models of the scene; real-time multi-view stereo to create a dense scene geometry from camera images; use of camera-sensor systems for 3D scene reconstruction with fusion of multiple orthogonal sensors; improved robust and fast estimation methods from noisy data to compensate for highly noisy measurements in various stages of the reconstruction process; high performance feature tracking for salient image-point motion extraction; and the development of data-parallel algorithms for commodity graphics hardware for efficient 3D reconstruction. He has published over 60 peer reviewed papers in international conferences and journals. Jan-Michael Frahm is Editor in Chief of the Elsevier journal of Image and Vision Computing.

  • 31.01.2012 / ICG / 13:00h

Title:Work Toward Thesis Proposal: Online Reconstruction for Augmented Reality (AR)
Speaker:Thanh Nguyen
Abstract: Part I: Revisit Marker Based Tracking and Marker-less Tracking in AR (20min) Among tracking solutions in AR, visual tracking seems to be a key and prominent approach. In our last work, we proposed solutions for two niche scenarios: marker based tracking and marker-less tracking for indoor environment ( can be easily extended to outdoor environment up to certain assumptions). These solutions include, relatively to the scenarios, random-dot-uniform-line and a fast-flexible-friendly slam. Part II: Building Toward A New Tracking Solution (15min) We have, recently, been developing an online localization and tracking system that uses 3D point cloud (outdoor environment) produced from sparse reconstruction. The system is aimed to run on a typical mobile device and be able to do self-relocalization in real-time. We also foresee that our system can suffer poor localization results because of obstacles from wide base-line feature matching. Therefore, a fast-affine-invariant features detection has been proposed. We will be presenting the current status and our vision toward the thesis proposal.

  • 24.01.2012 / ICG / 13:00h

Title:Online Model-Based Multi-Scale Pose Estimation
Speaker:Thomas Kempter
Abstract: In this thesis we propose a novel model- and point-based pose estimation approach, which is able to operate on multiple scales in real time. We build our work on a state-of-the-art visual Simultaneous Localization And Mapping (SLAM) approach and extend it to exploit metric prior knowledge about the geometry of a single object in the scene. Using keypoint-based localization, which actually tracks the object by its surroundings, we are robust to occlusions and can even determine a pose when the object vanishes completely. Additionally, we refine this pose based on edge information to increase accuracy when the object occupies a certain amount of the image. In our experiments we show an improvement of the mean translational localization error compared to a state-of-the-art SLAM system from 6.1 cm to 1.7 cm for solid objects, and from 6.9 cm to 2.6 cm for wiry objects. Furthermore, when tracking is lost due to a lack of distinctive features, a purely model-based tracking component takes over. This reduces the number of frames for which the pose cannot be estimated by more than 20%. Our approach delivers a metrically correct pose estimate relative to a known object solely based on visual input from a single camera, which is useful for different robotic applications such as the autonomous inspection of a power pylon by an Unmanned Aerial Vehicle (UAV).

  • 17.01.2012 / ICG / 13:00h

Title:Short review on Fractal & Chaos Game Theory
Speaker:Mahdi Jampour

  • 13.01.2012 / ICG / 13:00h

Title:Convex relaxation of a class of vertex penalizing functionals
Speaker:Thomas Pock
Abstract: We investigate a class of variational problems that incorporate in some sense curvature information of the level lines. The functionals we consider incorporate metrics defined on the orientations of pairs of line segments that meet in the vertices of the level lines. We discuss two particular instances: One instance that minimizes the total number of vertices of the level lines and another instance that minimizes the total sum of the absolute exterior angles between the line segments. In case of smooth level lines, the latter corresponds to the total absolute curvature. We show that these problems can be solved approximately by means of a tractable convex relaxation in higher dimensions. In our numerical experiments we present preliminary results for image segmentation, image denoising and image inpainting.

(Joint work with Kristian Bredies, Benedikt Wirth)

  • 10.01.2012 / ICG / 13:00h

Title:Probabilistic Joint Image Segmentation and Labeling
Speaker:Adrian Ion
Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag, followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset, as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

References: Probabilistic Joint Image Segmentation and Labeling. A. Ion, J. Carreira, C. Sminchisescu. Neural Information Processing Systems (NIPS), 2011 Image Segmentation by Figure-Ground Composition into Maximal Cliques. A. Ion, J. Carreira, C. Sminchisescu. International Conference on Computer Vision (ICCV), 2011

  • 20.12.2011 / ICG / 13:00h

Speaker:Vladimir Kanchev

  • 13.12.2011 / ICG / 13:00h

Title:Introduction to Straight Skeletons
Speaker:Gernot Walzl
Abstract: The Straight Skeleton is an internal structure of polygons. It partitions the interior of a simple polygon with n vertices into n monotone polygons. The talk will investigate the problems of computing the straight skeleton and will give an overview about known algorithms. Can it be used as a replacement for the medial axis?

  • 06.12.2011 / ICG / 13:00h

Title:Free Viewpoint Virtual Try-On With Commodity Depth Cameras
Speaker:Stefan Hauswiesner
Abstract: We present a system that allows users to interactively control a 3D model of themselves at home using a commodity depth camera. It augments the model with virtual clothes that can be downloaded. As a result, users can enjoy a private, virtual try-on experience in their own homes. As a prerequisite, the user needs to enter or pass through a multi-camera setup that captures him or her in a fraction of a second. From the captured data, a 3D model is created. The model is transmitted to the user's home system to serve as a realistic avatar for the virtual try-on application. The system provides free-viewpoint high quality rendering with smooth animations and correct occlusion, and therefore improves the state of the art in terms of quality. It utilizes cheap hardware and therefore is affordable for and accessible to a wide audience.

  • 29.11.2011 / ICG / 13:00h
  • 22.11.2011 / ICG / 13:00h

Title:ISMAR Recap
Speaker:VR/AR group

  • 16.11.2011 / ITI Geb. 16 1 OG / 13:00h

Title:Physical Qualities of Interaction
Speaker:Prof. Andreas Butz, LMU
Abstract: While there are well established and widely accepted interaction concepts for the personal computer world, a comprehensive interaction concept for the often predicted aera of ubiquitous computing is still missing. When computers and digital artifacts mix with our physical environment, physicality seems to be a promising candidate to form the basis of such an interaction concept. In this talk, I will show some of our investigations into the design space between physical and digital worlds, and we can speculate together, where this may lead.

  • 15.11.2011 / ICG / 13:00h

Title:Recovery of Depth Information Using Paired Optical and Thermal Images
Speaker:Peter Pinggera
Abstract: This thesis deals with the recovery of dense depth information from thermal (far infrared spectrum) and optical (visible spectrum) images using computational stereo techniques. Systems which originally employ optical and thermal cameras separately could benefit from the obtained depth information based on the inherent stereo setup and without the need for additional hardware. However, the large differences in the characteristics of cross-spectral images make this task significantly more difficult than for the common optical stereo case. As a result no method has been proposed in previous work which is able to solve the considered problem. In this work we therefore investigate if a solution can be achieved by utilizing novel approaches as well as methods suggested in literature. A modular framework based on a common taxonomy of stereo algorithms is im- plemented as a basis for the conducted experiments. The most crucial aspect is the definition of robust matching cost measures which are able to describe local similarities between the cross-spectral images. Furthermore powerful optimization techniques prove to be essential for the computation of valid depth estimates. We implement, test and evaluate state-of-the-art robust matching cost methods and compare their performance with novel approaches. The influence of combina- tions with different types of optimization techniques is also investigated. Tests are performed on simulated as well as real cross-spectral stereo data, including both still images and video sequences. A qualitative evaluation and a comparison with standard optical stereo results shows that through the introduced approaches very coarse but largely valid dense depth estimates can indeed be achieved. We obtain best results by using distances between dense descriptors based on histograms of unsigned oriented image gradients (HOG and DAISY descriptors) as a matching cost in combination with semi-global matching optimization. In all our experiments this approach outperforms methods which have previously been suggested for use in such a scenario like mutual information or dense local self-similarity descriptors.

  • 08.11.2011 / CGV / 13:00h

Speaker:Prof. Anders Hast

  • 25.10.2011 / ICG / 13:00h

Title:Spatio-Temporal Video Processing
Speaker:Manuel Werlberger
Abstract: Part I: The ability to generate intermediate frames between two given images in a video sequence is an essential task for video restoration and video post-processing. In addition, restoration requires robust denoising algorithms, must handle corrupted frames and recover from impaired frames accordingly. In this paper we present a unified framework for all these tasks. In our approach we use a variant of the TV-L1 denoising algorithm that operates on image sequences in a space-time volume.

Part II: There is a general trend to use space-time volumes for video processing. As motion is the essential feature for almost any video processing task it is favourable to imply the temporal information already at the motion estimation stage. We demonstrate an approach to directly compute trajectories of arbitrary ordering.

  • 18.10.2011 / ICG / 13:00h

Title:Indoor Navigation with Mixed-Reality Views and Sparse Localization
Speaker:Alessandro Mulloni
Abstarct: We present our recent work on supporting indoor navigation with Mixed Reality when continuous localization is not possible. We combine activity-based instructions with sparse localisation at selected info points in the building. Based on localisation accuracy the interface adapts the visualisation by changing the density and quality of information shown. We refine and validate our designs through user involvement in a series of user studies. Our results validate our design and show that info points act both as confirmation points and as overview points.

  • 11.10.2011 / CGV / 13:00h

Title:Dynamic Illumination for Robust Microscopic 3D Metrology
Speaker:David Ferstl
Abstract: Traditional microscopic shape from focus reconstruction is often limited by the surface dynamic and the texture of the analyzed specimen. In many real-world applications, surfaces have a strong varying reflectance leading to saturated image parts, or lack in detectable texture. In such cases, shape from focus generates incorrect and sparse depth maps. We present a novel method to eliminate these vulnerabilities without additional reconstruction time. Beyond that, we propose a novel method to further reduce the computational costs of traditional shape from focus to a minimum. To overcome the problems of high reflectance differences and lacks in texture we use a projector-camera system to compensate the reflectance variations and additionally project measurable texture. The surface reflection is compensated by a local adaption of the illumination for every acquisition. To reduce measurement time, the compensation pattern is tracked through the image stack and is updated in a prediction-correction step. The exact projector pattern to create additional texture is determined through a detailed analysis of the focus measure operator and the optical effects during the projection. The additional reduction in measurement time is achieved with a novel focus measure which calculates the focus through a comparison of an estimated all-in-focus image and the stack images by normalized cross correlation. Therewith, the depth estimation of each surface point in the shape from focus algorithm stops if a local focus maximum beyond a predefined threshold is found. The experiments show, that our method outperforms the traditional shape form focus algorithm and is also a performance enhancement to comparable methods like high dynamic range imaging in terms of speed and accuracy.

  • 04.10.2011 / ICG / 13:00h

Title:Video-Based Human Body Action Recognition for Games
Speaker:Markus Murschitz
Abstract: If a person performs an action (like walking), and is captured by a camera, it turns out that the captured video contains patterns corresponding to the action performed. Such is exploited by the field of Human Body Action Recognition, where videos are classified using computer vision algorithms. Human Body Action Recognition is an active research topic. As with various modern-day human-computer-interface technologies one of the first applications is the gaming industry. The aim of this work is to utilize action-classifications to control a computer game, where a specific action corresponds to a specific keystroke pressed in the game. While many published works address the topic of video-based action recognition, there are not many works, which are able to perform it in real time, which is a crucial requirement for any gaming application. Another requirement is to be able to the detect the number of repetitions an action has been performed. In this work this two requirements are addressed. The real-time requirement is addressed by exploiting the massively parallel computing capabilities of modern graphics cards for dense feature extraction and actor detection. Where the features are Histograms of Oriented Gradients (HOG) and Local Binary Patterns (LBP) on appearance and the Histogram of Flow-Orientations (HOF) and Local Binary Patterns on Flow-magnitude (LBFP). The motion and flow information of the actor are transformed into a prototype per frame. The sequence of prototypes is analyzed by subsequence-matching to known action-specific prototype sequences. This results in an action classification and information about their temporal alignment, which is used to perform repetition detection. The repetition detection and action classification are performed by utilizing Dynamic Time Warping (DTW) as a distance measure. Evaluations are performed on several public available datasets and compared to results of other works. It turns out that the system leads to comparable (but slightly inferior) results which are accomplished in real time. Finally a prove of concept is given by incorporating the full evaluation pipeline with a game, where the evaluation pipeline works as a substitute for a keyboard.

Title:Automatically Generated Transfer Function Galleries for High Dimensional Multivariate Data
Speaker:Markus Muchitsch
Abstract: This thesis deals with the design of transfer functions for the visualization of volumetric data sets by means of direct volume rendering. Thereby, we focus on the visualization of medical data sets. Transfer functions define, which structures of a data set are visible and how they appear. To this, contained voxels are assigned optical properties such as color and opacity depending on their data values. A scalar data set is frequently insufficient for an unambiguous classification of different structures, as they are often defined by overlapping value ranges. A better discriminability can be achieved using multivariate data sets in which each voxel is described by multiple parameters. For their visualization multi-dimensional transfer functions are required. The design of simple one-dimensional transfer functions by means of primitive editors already entails a time consuming, error-prone and inefficient trial and error approach. At the creation of multi-dimensional transfer functions this issue drastically intensifies. In any case users require knowledge about the technical background of transfer functions as well as the data set to be visualized. Furthermore, a lot of experience in handling the chosen editor is needed to gain suitable results. This work presents a system with the primary goal to allow a simple, intuitive and efficient creation of transfer functions. Using special histogram calculations value ranges of structures in the transfer function space are detected. Based on them, initial transfer functions are created. This method spares the tedious trial and error approach of the manual editor. In addition, only minimal knowledge on the data set to be visualized is required. Thumbnails created from the initial transfer functions are arranged in a clear as well as simply and efficiently navigable gallery. For their implementation two approved user interface concepts, known for their good usability, have been selected and adapted to enable efficient transfer function design. By interacting with the thumbnails depicted in the gallery, transfer functions associated to them can be adapted and combined within a reasonably constrained scope. Since interactions are not performed directly in the transfer function space, immediate knowledge about the impact of transfer functions on the visualization of a data set is not necessary. Even for users experienced in handling primitive editors, this technique entails a significant acceleration of the design work and improvement of the created results. The possibility for further adaptation and combination of transfer functions is important to create visualizations according to the current task. For the distinction of structures with overlapping value ranges, the presented system works with multi-dimensional separable transfer functions. Using conventional multi-dimensional transfer functions, a reasonable interaction is only possible with up to three dimensions. In contrast, separable transfer functions allow the application of an arbitrary amount of dimensions. These can be adapted separately and are combined according to a certain scheme. The complexity, increasing with the dimensionality of separable transfer functions, is hidden by the user interface of the developed system. Erroneous classifications, potentially occurring at the application of separable transfer functions, are prevented by means of a color test performed in the used render system. Weaknesses identified at the color test based evaluation are addressed with a preliminary prototype for improved evaluation of multi-dimensional separable transfer functions.

  • 27.09.2011 / ICG / 13:00h

Title:Variational Multiview Range Estimation
Speaker:Gottfried Graber
Abstract: Variational methods have been shown to be very effective in computing dense depthmaps from stereo images. Most algorithms aim at computing the disparity field, which makes the integration of multiple views hard. In this work we propose to compute depth directly, which not only simplifies the extension to multiple views significantly, but also increases the robustness and quality of the resulting depthmaps. We provide results on both synthetic and real world data, where our method performs comparable to current state of the art multiview stereo algorithms.

Title:Bildgestutzte Qualitätsprüfung von Feuerbohnen
Abstract: The presented work deals with the sorting of food. In detail, the sorting of scarlet runners is addressed using an image-based method. Since the work cannot cover all necessary topics, it focuses on the implementation of the image processing part. For this purpose, food sorting is introduced in general, some information about scarlet runners and project requirements is the starting point of the work. To get an overview, sorting concepts and software-related issues are discussed. This also helps to define exact specifications for the image processing system. Fundamentals in image processing are introduced afterwards, including image acquisition and technologies, preprocessing and feature extraction. Fur- ther needed subjects are machine learning and real-time systems. Methods to implement a suitable software are described and analyzed using complexity. Practical usage of the solutions is proofed via experiments to evaluate performance and timing behavior. The final chapter gives some conclusions and perspectives.

  • 13.09.2011 / ICG / 13:00h

Title:Transforming image completion
Speaker:Alex Mansfield, ETH Zürich
Abstract: Image completion is an important photo-editing task which involves synthetically filling a hole in the image such that the image still appears natural. State of the art image completion methods work by searching for patches in the rest of the image that fit well in the hole region. Our key insight is that image patches remain natural under a variety of transformations (e.g. scale, rotation and brightness change), and this should be exploited. We extend the image completion model of Wexler et al. 2007 and investigate numerous optimisation techniques. We show how to achieve results that outperform previous state of the art with reasonable efficiency.

  • 26.08.2011 / ICG / 13:00h

Speaker:Paul Wohlhart
Abstract: TBA

  • 25.07.2011 / ICG / 13:00h

Title:Identifying social norms in virtual agent societies
Speaker:Bastin Tony Roy Savarimuthu
Abstract: Social norms are expectations of an agent (human or software) about the behaviour of other agents in the society. Examples of social norms followed in the human agent society are the obligation norm of gift exchange during Christmas and the prohibition norm against littering a public place. Social norms are simple constructs that are used to facilitate cooperation in human societies. In the field of computer science, normative multi-agent system researchers study how norms can be used to facilitate cooperation and collaboration among software agents. In virtual environments such as second life, avatars embody software agents (i.e. they are proxies to humans). It is important for a software agent operating under the open world assumption to be endowed with computational mechanisms to identify norms that may govern its behaviour and its interactions with others. Otherwise, social sanctions may ensue for not following the norms. In this talk, I will first provide an overview of the internal agent architecture for norm identification which a software agent can use to identify the norms of a society. Second, I will discuss how one particular type of norm, the prohibition norm (e.g. don't litter the park) can be identified. Third, I will briefly discuss how obligations norms can be identified. The mechanisms developed in this work are applicable to software entities in virtual environments such as Second Life and Massively Multi-player online games.

Biography: Bastin Tony Roy Savarimuthu received his Master of Engineering (ME) degree in Software Systems from Birla Institute of Technology and Science, Pilani, India. He is currently a Lecturer in Information Science at the University of Otago, Dunedin, New Zealand. His primary area of research is normative multi-agent systems. His recently submitted PhD work focuses on how norms emerge in artificial agent societies and how agents can identify norms in open agent societies. His other research interests include mobile computing, social networking and software engineering. More details can be found at

  • 28.06.2011 / ICG / 13:00h

Title:On-line Data Analysis Based on Visual Codebooks
Speaker:Vítězslav Beran, Department of Computer Graphics and Multimedia,Brno University of Technology
Abstract: This work introduces the new adaptable method for on-line video searching in real-time based on visual codebook. The new method addresses the high computational efficiency and retrieval performance when used on on-line data. The method originates in procedures utilized by static visual codebook techniques. These standard procedures are modified to be able to adapt to changing data. The procedures, that improve the new method adaptability, are dynamic inverse document frequency, adaptable visual codebook and flowing inverted index. The developed adaptable method was evaluated and the presented results show how the adaptable method outperforms the static approaches when evaluating on the video searching tasks. The new adaptable method is based on introduced flowing window concept that defines the ways of selection of data, both for system adaptation and for processing. Together with the concept, the mathematical background is defined to find the best configuration when applying the concept to some new method. The practical application of the adaptable method is particularly in the video processing systems where significant changes of the data domain, unknown in advance, is expected. The method is applicable in embedded systems monitoring and analyzing the broadcasted TV on-line signals in real-time.

  • 21.06.2011 / ICG / 13:00h

Title:3D Object Categorization and Pose Estimation using 3D Gaussian Mixture Contour Models
Speaker:Kerstin Pötsch
Abstract:In this talk I give an overview of my thesis which is concerned with the problem of object categorization based on 3D shape models. 2D shape models based on 2D contour fragments are powerful cues for object categorization but these methods are restricted to one aspect. Therefore, we are going towards object categorization based on 3D shape models. 3D shape is modelled by 3D contour fragments (1D embedded in 3D). We use an approach based on Gaussian Mixture Models for learning 3D category models which we further use for pose estimation in 2D images. I will demonstrate our 3D categorization system on our own dataset and I will demonstrate our pose estimation approach on seventeen poses of the ETH80 database.

  • 14.06.2011 / ICG / 13:00h

Title:Efficient Structure from Motion with Weak Position and Orientation Priors
Speaker:Arnold Irschara
Abstract: In this paper we present an approach that leverages prior information from global positioning systems and inertial measurement units to speedup structure from motion computation. We propose a view selection strategy that advances vocabulary tree based coarse matching by also considering the geometric configuration between weakly oriented images. Furthermore, we introduce a fast and scalable reconstruction approach that relies on global rotation registration and robust bundle adjustment. Real world experiments are performed using data acquired by a micro aerial vehicle attached with GPS/INS sensors. Our proposed algorithm achieves orientation results that are sub-pixel accurate and the precision is on a par with results from incremental structure from motion approaches. Moreover, the method is scalable and computationally more efficient than previous approaches.

Title:Artifact-free JPEG decompression as constrained optimization problem
Speaker:Martin Holler, Institute for Mathematics and Scientific Computing, Uni Graz
Abstract: The problem of artifact-free decompression of a given JPEG compressed image is addressed. This is done by formulating a constrained optimization problem involving a data fidelity- and a regularization term. The main focus is put on the regularization term. At first, the choice of the well known Total Variation (TV) functional for regularization is discussed. Then, the recently introduced Total Generalized Variation (TGV) functional is considered as regularization term and obtained results are compared with results from the TV based model. At last, the TGV based model is extended to handle color JPEG images, where color sub-sampling has been applied. For both models, the resulting minimization problem is solved at the use of a primal-dual algorithm. Computation times of graphics processing unit (GPU) based implementations are presented.

  • 07.06.2011 / ICG / 13:00h

Title:ISMAR 2011 Submissions
Speaker:Gerhard Reitmayer, et al.
Abstract: Review of ISMAR 2011 submissions. Approximately 10 short presentations.

  • 25.05.2011 / ICG / 13:30h

Title:Talk of Markus Hadwiger
Speaker:Markus Hadwiger
Abstract: Dr. Markus Hadwiger is Assistant Professor of Computer Science in the Division of Mathematical and Computer Sciences and Engineering at KAUST. He assumed his duties in October 2009.

Prior to his appointment at KAUST, Dr. Hadwiger was a Senior Researcher at the VRVis Research Center for Virtual Reality and Visualization in Vienna, Austria. During this time, he conducted extensive basic and applied research in scientific visualization, especially volume visualization and medical visualization, as well as research on GPU-based algorithms.

Dr. Hadwiger’s research interests are in scientific visualization, especially petascale visualization and scientific computing, volume visualization, medical visualization, interactive segmentation and image processing, GPU-based algorithms, and general-purpose computations on GPUs.

He is a co-author of the book “Real-Time Volume Graphics” published by A K Peters in 2006, and has been involved in many courses and tutorials about volume rendering and visualization at ACM SIGGRAPH, ACM SIGGRAPH Asia, IEEE Visualization, and Eurographics. Dr. Hadwiger has co-authored more than 30 refereed articles.

In 2008, Dr. Hadwiger has been awarded a multi-year ICT basic research grant from the Vienna Science and Technology Fund WWTF for research on scalable semantic petascale visualization. Also, he was a co-recipient of the Best Application Paper award at IEEE Visualization 2007. Dr. Hadwiger is a member of the IEEE and Eurographics.

Dr. Hadwiger received his doctoral and master’s degrees in Computer Science from Vienna University of Technology, Austria.

  • 24.05.2011 / ICG / 13:00h

Title:Vision-Based Quality Inspection in Robotic Welding
Speaker:Markus Heber
Abstract: In this work we present a novel method for assessing the quality of a robotic welding process. While most conventional automated approaches rely on non-visual information like sound or voltage, we introduce a vision-based approach. Although the weld seam appearance changes, we exploit only the information from error-free reference data, and assess the welding quality through the number of highly dissimilar frames. In our experiments we show, that this approach enables an efficient and accurate separation of defective from error-free weldings, as well as detection of welding defects in real-time by exploiting the spatial information provided by the welding robot.

Title:Learning Face Recognition in Videos from Associated Information Sources
Speaker:Paul Wolhart
Abstract: Videos are often associated with additional information that could be valuable for interpretation of its content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we propose a new semi supervised multiple instance learning algorithm, where the contribution is twofold. First, we can transfer information on labeled bags of instances, thus, enabling us to weaken the prerequisite of knowing the label for each instance. Second, we can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset.

  • 17.05.2011 / ICG / 13:00h

Title:Multi-View Stereo: Redundancy Benefits for 3D~Reconstruction
Speaker:Markus Rumpler
Abstract: This work investigates the influence of using multiple views for 3D reconstruction with respect to depth accuracy and robustness. In particular we show that multiview matching not only contributes to scene completeness, but also improves depth accuracy by improved triangulation angles. We first start by synthetic experiments on a typical aerial photogrammetric camera network and investigate how baseline (i.e. triangulation angle) and redundancy affect the depth error. Our evaluation also includes a comparison between combined pairwise triangulated and fused stereo pairs in contrast to true multiview triangulation. By analyzing the 3D uncertainty ellipsoid of triangulated points we demonstrate the clear advantage of a multiview approach over fused two view stereo algorithms. We propose an efficient dense matching algorithm that utilizes pairwise optical flow followed by a robust correspondence chaining approach. We provide evaluation results of the proposed method on ground truth data and compare its performance in contrast to a multiview plane sweep method.

Title:Large-Scale Robotic SLAM through Visual Mapping
Speaker:Christof Hoppe
Abstract: Keyframe-based visual SLAM systems perform reliably and fast in medium-sized environments. Currently, their main weaknesses are robustness and scalability in large scenarios. In this work, we propose a hybrid, keyframe based visual SLAM system, which overcomes these problems. We combine visual features of different strength, add appearance-based loop detection and present a novel method to incorporate non-visual sensor information into standard bundle adjustment frameworks to tackle the problem of weakly textured scenes. On a standardized test dataset, we outperform EKFbased solutions in terms of localization accuracy by at least a factor of two. On a self-recorded dataset, we achieve a performance comparable to a laser scanner approach.

  • 13.05.2011 / HSi5 / 14:30h

Title:Incremetal learning with import vector machines
Speaker:Wolfgang Förstner
Abstract: Incremental learning addresses the adaption of the learned model necessary due to changes in appearance, shape, the set of features used for identication, and the complexity of the model. Incremental learning needs a model with a generative and discriminative component, which allows to handle a large variety of object classes and simultaneously being efficient for distinguishing similar object classes. Among the many approaches used for incremental learning, import vector machines (IVM) appear to have a great potential, which has not been exploited yet. The IVM are a sparse, kernel-based discriminative model, similar to the well-known support vector machines. Due to the similarity of the both models, namely depending on an affine combination of the features, IVM can be used for the same type of problems as SVMs. However, the IVM provide a probabilistic output, are sparser, and appear to contain a generative component though not constructed this way. The talk presents basic idea of import vector machines, especially adresses the differences to SVN's, - based on recent work of recent work of Ribana Roscher - shows how to arrive at an efficient incremental multi-class learning scheme and illustrates the potential using classical datasets and an object tracking approach.

  • 12.05.2011 / HSi11 / 16:00h

Title:On the role of order of training examples for incremental learning
Speaker:Susanne Wenzel
Abstract: We adress the problem of learning classes where it is impossible to capture the huge variability of examples with one dataset at one time but obtain these examples over time. A continuous learning system would be able to improve already learned models using new examples. There exist a number of incremental learning methods approaching this problem. But one can easily show that the performance of these methods depends on the order of examples for training, a problem which is not adresse in most publications. This talk points out the role of the order of samples for training an incremental learning method. We define characteristics of incremental learning methods to describe the influence of sample ordering on the performance of a learned model. We sketch different types of experiments to evaluate these properties. Based on the estimation of Bayes error bounds, we show how to find sequences of classes for training just based on the data to always get obtain the best possible error rates.

Title:Spatial and Hierarchical Structures for Interpreting Images of Man-made Scenes
Speaker:Michael Ying Yang
Abstract: Classification of various image components (pixels and regions) in meaningful categories is a challenging task due to ambiguities inherent to image data. Images of man-made scenes, e.g. building facade images, exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different regions appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. Graphical models provide a consistent framework for the statistical modeling. In this talk, we present a conditional random field (CRF) to model the spatial structures in the image. The unary potentials are built on the probability output of an efficient randomized decision forest classifier which acts on the region level. The pairwise potentials are introduced to enforce spatial consistency between neighboring regions. To exploit different levels of contextual information in images, a hierarchical conditional random field (HCRF) is described as an extension of CRF. The hierarchical structure of the regions is integrated into pairwise potentials. The model is built on multi-scale image analysis in order to aggregate evidence from local to global level.

Title:Real-time reconstruction of road-surfaces and curbs from stereo-image-sequences
Speaker:Jan Siegemund
Abstract: Robust registration and modeling of the ego vehicle's free driving space provides the basis for many high-level driving assistance applications, such as path planing and collision avoidance. In this context, curbs are an important delimiting structure of this free-space, usually representing the boundary between the driving lane and the sidewalk. However, most existing systems for obstacle detection classify curbs as road inliers, due to their low height occurrence. Concerning the sensor aspect, stereo cameras are getting affordable and provide several advantages, such as a high data rate and a low requirement of space inside the vehicle. In my talk I will present a method to employ Conditional Random Fields for real-time reconstruction of road-surfaces and curbs from stereo-image-sequences.

  • 10.05.2011 / ICG / 13:00h

Title:Human Action Recognition using Multiple Instance Learning: A Comparative Study
Speaker:Gerald Fritz
Abstract: This thesis is situated in the scope of human action recognition and is concerned with two major objectives. First, it presents a comparative study of five different multiple instance learning (MIL) approaches and relates the results to those reported for state-of-the-art approaches in this field. Second, this work considers whether a sparse, part-based representation is able to support the consecutive classification process.

We investigate a non-negative matrix factorization with sparseness constraints and determine how such a representation contribute to performance improvements. Furthermore, we analyse the impact of a structured initialization towards a better part-based representation and present results for two different nearest neighbour approaches in a face recognition experiment. In the main part of this thesis we investigate, whether a MIL concept is suitable for an action recognition task. We perform a thoroughly and detailed evaluation of different MIL approaches on the Weizmann action dataset and the KTH benchmark.

Results on the ORL database of faces demonstrate that sparse, part-based representation beneficially supports the subsequent classifier. In particular, if the level of sparseness is significantly greater than those obtained by an unconstrained matrix factorization, then both classifiers achieved an increased performance of $\sim 1.5\%$. Results on the Weizmann dataset show that three out of five MIL methods achieved competitive or better accuracies compared to a linear SVM classifier. Evaluations on the KTH benchmark demonstrate, that the best MIL approach (\emph{miGraph}) performed equally well up to a moderate level of noise. Finally, a solid comparison with recent approaches in the field of human action recognition complements the discussion of both datasets.

Title:Efficient SLAM: Ideas and theoretical analysis
Speaker:Handa Ankur
Abstract: Visual SLAM is process by which a robot builds a map of the environment as well as compute it's own position at the same time using only camera as the sensor. It uses various landmarks to build the map environment. These landmarks can be point-features, edges or it can be fully dense map. Building on from the work on the classic MonoSLAM mapping sparse point features, we explore the possibilities of making very efficient SLAM algorithms that scale well with the increase in number of features and in the end delve into the theoretical side of it in a brief asking very simple questions e.g. what frame-rate to use, how many features to use and which image resolution to use to obtain anytime SLAM algorithm working under hard computational constraints.

  • 03.05.2011 / ICG / 13:00h

Title:Efficiently Locating Photographs in Many Panoramas
Speaker:Michael Kroepfl / Microsoft Research Redmond

Efficiently Locating Photographs in Many Panoramas

We present a method for efficient and reliable geo-positioning of images. It relies on image-based matching of the query images onto a trellis of existing images that provides accurate 5-DOF calibration (camera position and orientation without scale). As such it can handle any image input, including old historical images, matched against a whole city. On such a scale, care needs to be taken with the size of the database. We deviate from previous work by using panoramas to simultaneously reduce the database size and increase the coverage. To reduce the likelihood of false matches, we restrict the range of angles for matched features. Furthermore, we enhance the RANSAC procedure to include two phases. The second phase includes guided feature matching to increase the likelihood of positive matches. Hence, we devise a matching confidence score that separates between true and false matches. We demonstrate the algorithm on a large scale database covering a whole city in order to show its usefulness for a vision-based augmented reality system.

Read/Write World – Location Matching API

Based on our previous work on location based image matching, we developed a cloud-based image matching service, which allows matching of an arbitrary set of input images (panoramas and regular images) to a large index of geo-tagged images from different sources on the web. Through image matching, we can create a match-graph which can link media from different sources with other media and associated meta-data. We recently announced ( the availability of this web-service to developers through a publicly available API, allowing developers to create their own applications using the service. In this talk, I will provide an overview of the functionality showcased on the project’s web-page (, including demonstrations of the individual service endpoints.

  • 19.04.2011 / ICG / 13:00h

Title:Natural Landmark–based Monocular Localization for MAVs
Speaker:Andreas Wendel
Abstract:Highly accurate localization of a micro aerial vehicle (MAV) with respect to a scene is important for a wide range of applications, in particular surveillance and inspection. Most existing approaches to visual localization focus on indoor environments, while such tasks require outdoor navigation. Within this work, we introduce a novel algorithm for monocular visual localization for MAVs based on the concept of virtual views in 3D space. Under the assumption that significant parts of the scene do not alter their geometry and serve as natural landmarks, the accuracy of our visual approach outperforms consumer grade GPS systems. In an experimental setup we compare our approach to a state–of–the–art visual SLAM algorithm and evaluate the performance by geometric validation from an observer's view. As our method directly allows global registration, it is neither prone to drift nor bias. This makes it well suited for long–term autonomous navigation.

Title:Facade Segmentation in a Multi-View Scenario
Speaker:Andreas Wendel
Abstract:We examine a new method of façade segmentation in a multi-view scenario. A set of overlapping, thus redundant street-side images exists and each image shows multiple buildings. A semantic segmentation identifies primary areas in the image such as sky, ground, vegetation, and facade. Subsequently, repeated patterns are detected in image segments previous labeled as "facade areas" and are applied to separate specific facades from each other. Experimentation is based on an industrial street-view dataset from a moving car by well-designed, calibrated, automated cameras. High overlap images define a multi-view scenario. We achieve 97% pixel-wise segmentation effectiveness, outperforming current state-of-the-art methods.

  • CANCELED !!! 12.04.2011 / ICG / 13:00h CANCELED !!!

Title:Bildgestützte Qualitätskontrolle von Feuerbohnen
Speaker:Heinz Fleischhacker
Abstract:Die vorliegende Arbeit befasst sich mit der Sortierung von Lebensmitteln, im konkreten wird eine L ̈sung zur bildgest ̈tzten Sortierung von Feuerbohnen pr ̈sentiert. Da die Ar- beit jedoch nicht den gesamten Bereich der daf ̈r notwendigen Technologien abzudecken vermag, liegt der Fokus auf der Implementierung der Bildverarbeitung. Zu diesem Zweck werden zu Beginn Aufgaben der Lebensmittelsortierung erl ̈utert, kurze Informationen zur speziellen Kultur der Feuerbohne leiten das Projekt ein. Es folgen Grundlagen der Bild- verarbeitung in den Bereichen Bildgewinnung und Technologien, Vorverarbeitung sowie Eigenschaftsextraktion. Danach werden Elemente des maschinellen Lernens sowie Echt- zeitsysteme eingef ̈hrt, welche f ̈r die weiteren Darstellungen ebenfalls ben ̈tigt werden. Es folgt die Vorstellung von Sortierkonzepten, aus denen die Anforderungen einer Bildver- arbeitungssoftware abgeleitet werden, welche Feuerbohnen klassifizieren kann. Nach Fest- legung dieser Anforderungen werden Algorithmen vorgestellt, welche f ̈r kontinuierliche Datenverarbeitung ausgelegt sind. Im Fall der Analyse von Feuerbohnen kommt diese Not- wendigkeit durch die Verwendung von Zeilensensoren zu Stande. Die Algorithmen werden analysiert und an Hand einer konkreten Implementierung im Detail getestet. Daraus wird die Verwendbarkeit der entwickelten L ̈sungen argumentiert. Abschließende Bemerkungen geben einen weiteren Ausblick.

  • 05.04.2011 / ICG / 13:00h

Title:Realtime Incremental Image Stitching For Industrial Quality Inspection
Speaker:Robert Lanner
Abstract:The task of image stitching is to create a high quality panorama from a set of partially overlapping input images. In order to align images, image registration is applied. A blending method is finally used to stitch the aligned images and to create smooth transitions between them without visible artifacts. Hence, the two main steps for the generation of a panorama are image registration and image blending. This thesis presents an image stitching system that creates a weld seam survey image for visual quality inspection of welding processes. The image registration is based on salient keypoint extraction and robust motion estimation. By incorporation of available tracking data, the system gurantees a successful mosaic generation. Furthermore, the system includes an incremental blending strategy to provide an online generation of image mosaics and noise filtering methods to cope with e.g. smoke or sparks that typically occur during welding processes.

  • 15.03.2011 / ICG / 13:00h

Title:Enforcing topological constraints in random field image segmentation
Speaker:Chao Chen / IST Austria
Abstract:We introduce a new way to integrate knowledge about topological properties (TPs) into random field image segmentation model. Instead of including TPs as additional constraints during minimization of the energy function, we devise an efficient algorithm for modifying the unary potentials such that the resulting segmentation is guaranteed with the desired properties. Our method is more flexible in the sense that it handles more topology constraints than previous methods, which were only able to enforce pairwise or global connectivity. In particular, our method is very fast, making it for the first time possible to enforce global topological properties in practical image segmentation tasks.

  • 04.03.2011 / ICG / 13:00h

Title: Visualization of Multimodal Volume Data to Diagnose Cardiac Disease

Speaker:Gitta Domik, Stephan Arens, Nico Bredenbals , Research Group “Computer Graphics, Visualization and Image Processing”, Department of Computer Science, University of Paderborn, Germany
Abstract: In corporation with the Heart- and Diabetes Centre of North-Rhine Westphalia we develop the software package Volume Studio to support diagnosis of cardiac disease using medical volume data. Our flexible, GPU-based framework is able to combine several volume data sets in form of different modalities (e.g. CT and PET), different metrics (e.g. size, curvature, vesselness, texture) and segmentations. Hence complex compositing of multimodal volumes, defined by a pipeline of transfer functions, sampling elements, and logical operations, is possible at runtime. We also present work-in-progress on a heart model to increase the effectiveness of diagnosis and on the use of controlled experiments to test the effectiveness of various visualizations (e.g. transfer functions, Curved Planar Reformations).*

  • 14.02.2011 / ICG / 16:00h

Title: Minimal representations for the estimation of uncertain projective entities
Speaker: Prof. Dr.-Ing. Wolfgang Förstner, Institut für Geodäsie und Geoinformation, Universität Bonn.

Abstract: Estimation using homogeneous entities has to cope with obstacles such as singularities of covariance matrices and redundant parameterisations which do not allow an immediate definition of maximum likelihood estimation and lead to estimation problems with more parameters than necessary. The talk presents a representation of the uncertainty of all types of geometric entities and estimation procedures for geometric entities and transformations which (1) only require the minimum number of parameters, (2) are free of singularities, (3) allow for a consistent update within an iterative procedure, (4) enable to exploit the simplicity of homogeneous coordinates to represent geometric constraints and (5) allow to handle geometric entities which are at infinity or at least very far, avoiding the usage of concepts like the inverse depth.

We discuss the concept and show its usefulness for bundle adjustment, estimating vanishing points or 3D lines from 3D points and for determining 3D lines from observed image line segments in a multi view setup.

  • 11.02.2011 / ICG / 13:00h

Title:A Short Overview of Work on “Interactive 3D Graphics and Games” at the University of Paderborn
Speaker:Gitta Domik, Research Group “Computer Graphics, Visualization and Image Processing”, Department of Computer Science, University of Paderborn, Germany
Abstract:In this (very short) talk I will give an overview of our work in the area of „Interactive 3D Graphics & Games“, which has established itself in two main areas:

(a) Real-time graphics in medicine, where we concentrate on multi modality imaging and visualization to diagnose cardiac disease through volume visualization (e.g. through transfer functions)

(b) Serious Games to support exposure therapy for children traumatized by traffic accidents.

A third area we work in is the development of competency for transdicisplinary collaboration in graduate students.

  • 08.02.2011 / ICG / 11:00h

Title:Coherent Image-Based Rendering of Real-World Objects
Speaker:Stefan Hauswiesner
Abstract:Many mixed reality systems require the real-time capture and re-rendering of the real world to integrate real objects more closely with the virtual graphics. This includes novel view-point synthesis for virtual mirror or telepresence applications. For real-time performance, the latency between capturing the real world and producing the virtual output needs to be as little as possible. Image-based visual hull (IBVH) rendering is capable of rendering novel views from segmented images in real time. We improve upon existing IBVH implementations in terms of robustness and performance by reformulating the tasks of major components. Moreover, we enable high resolutions and little latency by exploiting view- and frame coherence. The suggested algorithm includes image warping between successive frames under the constraint of redraw volumes. These volumes form a boundary of the motion and deformation in the scene, and can be constructed efficiently by describing them as the visual hull of a set of bounding rectangles which are cast around silhouette differences in image-space. As a result, our method can handle arbitrarily moving and deforming foreground objects and free viewpoint motion at the same time, while still being able to reduce workload by reusing previous rendering results.

Title:Context, social aspects and the use of sensors in mobile computing
Speaker:Mariusz Nowostawski
Abstract:During the talk Mariusz will briefly introduce the Information Science department of the University of Otago, faculty research interests and ongoing projects. The talk will present a number of context-aware mobile computing projects and the use of sensors in different areas. Mariusz will briefly discuss past projects such as: virtual stickies, fall detection, smoke alarm notification and Parkinson disease tremor studies. The talk will finish with the outlook on ongoing projects in the area of human activity tracking and life analytics.

  • 25.01.2011 / ICG / 13:00h

Title:Mobile Augmented Reality Campus Guide
Speaker:Claus Degendorfer
Abstract:Smartphones are becoming increasingly interesting as a mobile Augmented Reality (AR) platform over the past few years because of improved hardware resources, including processing power, memory capabilities, built-in cameras and GPS sensors. With such devices it is possible to create mobile AR information systems which provide augmented reality anywhere, at anytime. Some limitations of current AR systems are a lack of appropriate AR content, the inaccuracy of current sensorbased annotation matching approaches and the poor matching rates of vision-based approaches under changing environment conditions which we want to address in this work.

We therefore developed a system to enable end-users to create textual AR annotations which provide information about the surrounding environment on a global scale. Furthermore, we investigated the possibilities of vision-based annotation matching and implemented three diff erent improvements to annotation matching. Tests showed that a combination of these improvements can increase the annotation matching rate under difficult lighting situations by up to 50 %. This work was therefore one step in the evolution of mobile AR information systems.

Title:A Convex Approach to Layered Motion and Stereo
Speaker:Markus Unger
Abstract:Currently there is a trend towards layered motion models for optical flow estimation. A lot of the top performers on the Middlebury database already use some form of layers. In this talk we discuss advantages and disadvantages of a layered motion model. We present a novel approach that can handle a large number of layers (more than 100 times of current approaches). Our model consists of the Potts model on top of parametric layers with an additional layer for occlusion. We show that we can realize two common occlusion models by means of convex constraints in the Potts model. This allows us to jointly optimize for the layers and occlusions. We present some preliminary applications and results. As this is ongoing work vital discussions are welcome!

  • 18.01.2011 / ICG / 13:00h

Title:Learning Transformation Invariant Representations from weakly-related Videos for Tracking and Detection
Speaker:Samuel Schulter
Abstract: For current computer vision systems, object detection and tracking is a very challenging task, whereas humans perform very well on both of them. This comes from the fact that these systems have to cope with all variations and transformations that occur in natural scenes, such as shape, appearance, different illuminations and occlusions. In general, machine learning algorithms learn hypotheses based on labeled training data to correctly separate unseen test data according to their class, i.e., positive or negative. In computer vision, this principle also holds for detection algorithms as well as for tracking approaches that are based on classifiers. However, the amount of labeled training data available is often too small to capture all possible object transformations and intra-class variations, what makes generalization a hard task. In contrast, unlabeled data exist in large amounts and are typically easy to collect. But the extraction of useful informations from unlabeled data is difficult in practice, as they often stem from different distributions than the labeled data, i.e.. they are only weakly-related towards the target class. Therefore, we exploit videos as source of unlabeled data, because they comprise an underlying structure given by real-world constraints, which is the space-time coherence of naturally moving objects. This fact makes them more informative than a heterogenous collection of single images. That is, observing objects that undergo natural transformations allows for learning representations that are more transformation invariant, although the object labels are unknown. The main intent of this Master's Thesis is to incorporate video data to a state-of-the-art object detection and tracking system with the goal to learn more in- variant object representations and to yield better generalization performance. Based on a Random Forest framework, we define an optimization problem, which also involves data containing local transformations of naturally moving objects extracted from video sequences. We gathered real-world video sequences from the web and applied a dense optical flow, in order to extract useful motion information from video data. The evaluation of our methods shows that we can improve the generalization performance in object detection and tracking.

Speaker:Kerstin Pötsch
Abstract:We present a probabilistic framework for learning 3D contour-based category models represented by Gaussian Mixture Models. This idea is motivated by the fact that even small sets of contour fragments can carry enough information for a categorization by a human. Our approach represents an extension of 2D shape based approaches towards 3D to get a pose-invariant 3D category model. We reconstruct 3D contour fragments and generate what we call `3D contour clouds' for specific objects. The contours are modeled by probability densities, which are described by Gaussian Mixture Models. Thus, we obtain a probabilistic 3D contour description for each object. We introduce a similarity measure between two probability densities which is based on the probability of intra-class deformations. We show that a probabilistic model allows for flexible modeling of shape by local and global features. We show that even with small inter-class difference it is possible to learn one 3D Category Model against another category and thus demonstrate the feasibility of 3D contour-based categorization.

Title:Robust Multi-View Reconstruction from Highly Redundant Aerial Images
Speaker:Markus Rumpler
Abstract:This thesis investigates and presents robust multi-view matching methods to produce dense depth maps from highly redundant imagery. We investigate in several experiments the influence of different cost functions and cost aggregation schemes on the results of multi-view depth matching in a plane sweep framework. The evaluation includes local and global optimization methods. The main contribution of this thesis is an extension of the highly efficient TV-L1 optical flow algorithm that includes the epipolar constraint. While correspondence computation is still performed between pairs of images, we present a method for correspondence linking between nearby views. This enables the use of measurements from all neighboring views used for matching and provides wider baselines for robust and accurate triangulation. We provide evaluation results of the proposed method and present its performance in contrast to a standard plane sweep approach. The benefits include less computation time and memory costs, continuous results instead of discrete depth estimates and comparable but in most cases even better accuracy. It requires no or just little user guidance, thus our design is capable for integration into a fully automatic reconstruction pipeline.

  • 11.01.2011 / ICG / 13:00h

Title:Learning Potentials for Game Theory Based Graph Matching and Applications to Object Localization
Speaker:Michael Donoser
Abstract: This talk focuses on the graph matching problem of finding correspondences between two point sets using unary and pairwise potentials which analyze local descriptors and geometrical compatibility. Recently it was shown that optimal parameters for the features used in the unary potentials can be learned, which significantly improves results in supervised and unsupervised settings. It was demonstrated that even linear assignments (not considering geometry) with well learned potentials may improve over state-of-the-art quadratic assignment solutions. In this work we focus on two extensions of such methods. First we show that is also possible to directly learn pairwise potentials in terms of kernels functions for pairs of points in a supervised setting (using a statistical shape model) which significantly improves matching quality (up to 25%) in a priori known scenarios. Second, we describe a graph matching optimization formulation based on finding an evolutionary stable strategy which provides accurate assignments even in cases of a large number of outliers (outperforming related spectral approaches). Experiments on synthetic point sets, face alignment datasets and an application in the area of object localization demonstrate the broad applicability of the method.

  • 15.12.2010 / ICG / 14:30h

Title:A Bayesian Approach to Variational Methods
Speaker:Rene Ranftl
Abstract: Variational models are among the most successful methods for low-level Computer Vision tasks today. While such models can be derived and formulated in a completely deterministic setting, they nonetheless have a deep connection to the probabilistic framework of Bayesian inference. This thesis highlights this connection and the advantages that a probabilistic approach to variational methods can have. A fundamental question in variational models is the formulation of an appropriate image model. A especially popular image model is given by the Total Variation prior due to its edge preserving properties. It will be shown that the usually employed energy minimization approach is not able to fully exploit the properties of the underlying models if such a image prior is used. An alternative approach that is based on Bayesian estimation is introduced and the connections to energy minimization are highlighted. The proposed estimator is defined by a very high-dimensional integral that can not be solved with de- terministic numerical integration algorithms. To tackle this problem, the framework of Markov Chain Monte Carlo (MCMC) integration is introduced and refined into an algorithm that is specifically tai- lored to the needs of image processing. To speed up the computations, a parallelization scheme and an implementation on graphics processing hardware is proposed. We show the advantages of the proposed algorithm over the energy minimization approach on convex image reconstruction models. For non-convex models the MCMC approach allows for global optimiza- tion. Our experiments on different models for motion estimation and stereo reconstruction show that such a global optimization approach is not only feasible but also provides superior results.

  • 07.12.2010 / ICG / 13:00h

Title:Large-Scale Robotic SLAM through Visual Mapping
Speaker:Christof Hoppe
Abstract:Simultaneous Localization and Mapping (SLAM) in a three-dimensional environment is an essential requirement for autonomous mobile robots to accomplish high level tasks. An emerging sensor for SLAM is the digital camera, because it is cheap, small, has low weight and can be applied in many different application areas like marine, aerial or land robotics. Today's camera-based solutions, called \textit{visual SLAM}, are limited to small environments like desktop or office scenes because of geometric error propagation and limited scalability.

In this master thesis, we developed a SLAM system that allows us to handle large-scale environments using a stereo-camera mounted on a wheeled robot. Our approach extends a keyframe-based method for augmented reality applications by adding appearance-based loop detection and correction. Furthermore, we propose a method for incorperating other sensor information like odometry into the visual SLAM framework. We are hereby able to preserve connectivity between camera poses even if visual features are absent. To maintain map accuracy without sacrificing excessive computation time, we combine feature descriptors of different strength for data association.

In the experiments, we show that our approach is able to handle trajectories of several hundred meters and containing several thousand visual features. The resulting three-dimensional maps have correct metric scale. The absolute trajectory error is below one percent. On a standardized benchmark dataset providing groundtruth trajectories, our system outperforms other visual SLAM algorithms by a factor of two.

  • 02.12.2010 / ICG / 11:00h

Title:The Narcissistic Robot: Robot Calibration Using a Mirror
Speaker:Matthias Rüther
Abstract:We present a novel method for calibration of a robotic manipulator. The robot kinematic chain and its tool are observed by a hand mounted camera through a mirror. We demonstrate that this enables hand-eye, hand-tool, and kinematic robot calibration without incorporating accurate external references, except the mirror. Using this particularly simple setup, hand-eye calibration becomes independent of the kinematic chain and parameter observability constraints in kinematic calibration become more relaxed, which makes pose planning for robot calibration more convenient.

  • 26.11.2010 / ICG / 09:30h

Title:Real-Time Monocular SLAM and Dense Reconstruction
Speaker:Andrew Davison and Richard Newcombe
Abstract:Recent advances in probabilistic Simultaneous Localisation and Mapping (SLAM) algorithms, together with modern computer power, have made it possible to create practical systems able to perform real-time estimation of the motion of a single camera in 3D purely from the image stream it acquires. This is of interest in robotics, but also in other fields like wearable computing and augmented reality. We will review our research on visual SLAM over the past few years, and present new developments aimed at the challenges of estimating camera motion for very rapid or large scale motion. In particular we will highlight new work which harnesses GPGPU processing power and variational algorithms in order to recover dense scene models in real-time as a camera browses a natural scene.

  • 16.11.2010 / ICG / 13:00h

Title:On a first-order primal-dual algorithm
Speaker:Thomas Pock
Abstract:Variational methods have proven to be particularly useful to solve a num- ber of ill-posed inverse imaging problems. In particular variational methods incorporating total variation regularization have become very popular for a number of applications. Unfortunately, these methods are difficult to mini- mize due to the non-smoothness of the total variation. The aim of this pa- per is therefore to provide a flexible algorithm which is particularly suitable for non-smooth convex optimization problems in imaging. In particular, we study a first-order primal-dual algorithm for non-smooth convex optimiza- tion problems with known saddle-point structure. We prove convergence to a saddle-point with rate O(1/N) in finite dimensions for the complete class of non-smooth problems we are considering in this paper. We further show accelerations of the proposed algorithm to yield improved rates on easier problems. In particular we show that we can achieve O(1/N^2) con- vergence on problems, where the primal or the dual objective is uniformly convex, and we can show linear convergence, i.e. O(w^N), w<1 on problems where both are uniformly convex. The wide applicability of the proposed algorithm is demonstrated on several imaging problems such as image denoising, image deconvolution, image inpainting, motion estimation and image segmentation.

  • 05.11.2010 / ICG / 09:00h

Title:An Omnidirectional Time-of-Flight Camera and its Application to Indoor SLAM
Speaker:Katrin Pirker
Abstract:Photonic mixer devices (PMDs) are able to create reliable depth maps of indoor environments. Yet, their application in mobile robotics, especially in simultaneous localization and mapping (SLAM) applications, is hampered by the limited field of view. Enhancing the field of view by optical devices is not trivial, because the active light source and the sensor rays need to be redirected in a defined manner. In this work we propose an omnidirectional PMD sensor which is well suited for indoor SLAM and easy to calibrate. Using a single sensor and multiple planar mirrors, we are able to reliably navigate in indoor environments to create geometrically consistent maps, even on optically difficult surfaces.

Title:Interactive Multi-Label Segmentation
Speaker:Jakob Santner
Abstract:This paper addresses the problem of interactive multi-label segmentation. We propose a powerful new framework using several color models and texture descriptors, Random Forest likelihood estimation as well as a multi-label Potts-model segmentation. We perform most of the calculations on the GPU and reach runtimes of less than two seconds, allowing for convenient user interaction. Due to the lack of an interactive multi-label segmentation benchmark, we also introduce a large publicly available dataset. We demonstrate the quality of our framework with many examples and experiments using this benchmark dataset.

  • 02.11.2010 / ICG / 13:00h

Speaker: Martin Mörth
Abstract:Ein semantisches Gebäudemodell vereint sämtliche, für den Anwender relevanten Daten zu einem Gebäude, in einem einzigen Modell. Ein solches Modell wäre für viele Applikationen in der Gebäudetechnik eine hervorragende Basis. Neben Visualisierungen für Sicherheits- und Gebäudeleitstände könnten auch übergeordnete Aufgaben zur Steuerung des Gebäudes einfacher umgesetzt werden. Wird ein Gebäudemodell zur zentralen Bezugsquelle für gebäudespezifische Informationen, bringt dies viele Vorteile mit sich. Bei Änderungen am System würde ein fehleranfälliges Aktualisieren von dezentralen Datenbeständen entfallen.

Viele wissenschaftliche Arbeiten beschäftigen sich mit der Problematik Gebäudemodelle automatisch aus digitalen und analogen CAD-Plänen zu erfassen. Aufbauend auf diesen Arbeiten, wird ein Ansatz zur Erfassung von semantischen Gebäudemodellen aus CADPlänen erarbeitet. In einer Bereinigungsphase werden zunächst topologische Fehler in der geometrischen Repräsentation der Pläne korrigiert. Symbole, die in speziellen Schichten der Pläne zur Repräsentation von Türen und Fenstern eingezeichnet sind, werden zur semantischen Anreicherung der Plandaten verwendet. Von den geometrischen Elementen des Plans aufgespannte Flächen werden ermittelt und anhand dieser semantischen Attribute klassifiziert.

Das so gewonnene semantische Flächenmodell eines Gebäudes wird in weiterer Folge in einer relationalen Datenbank abgespeichert. Es kommt dabei ein Datenbanksystem zum Einsatz, das die Repräsentation geometrischer Merkmale in den Tabellen unterstützt.

Spezielle Funktionen können in Abfragen verwendet werden, um einen Mehrwert aus der geometrischen Repräsentation der Objekte zu erzielen.

Im letzten Teil der Arbeit soll der praktische Nutzen des Gebäudemodells verdeutlicht werden. In einer Beispielanwendung werden aus den Daten der relationalen Datenbank 3D-Modelle zur Visualisierung des repräsentierten Gebäudes generiert.

  • 19.10.2010 / ICG / 13:00h

Title: Comparative Analysis of Multidimensional, Quantitative Data
Speaker: Alexander Lex
Abstract:When analyzing multidimensional, quantitative data, the comparison of two or more groups of dimensions is a common task. Typical sources of such data are experiments in biology, physics or engineering, which are conducted in different configurations and use replicates to ensure statistically significant results. One common way to analyze this data is to filter it using statistical methods and then run clustering algorithms to group similar values. The clustering results can be visualized using heat maps, which show differences between groups as changes in color. However, in cases where groups of dimensions have an a priori meaning, it is not desirable to cluster all dimensions combined, since a clustering algorithm can fragment continuous blocks of records. Furthermore, identifying relevant elements in heat maps becomes more difficult as the number of dimensions increases. To aid in such situations, we have developed Matchmaker, a visualization technique that allows researchers to arbitrarily arrange and compare multiple groups of dimensions at the same time. We create separate groups of dimensions which can be clustered individually, and place them in an arrangement of heat maps reminiscent of parallel coordinates. To identify relations, we render bundled curves and ribbons between related records in different groups. We then allow interactive drill-downs using enlarged detail views of the data, which enable in-depth comparisons of clusters between groups. To reduce visual clutter, we minimize crossings between the views. This paper concludes with two case studies. The first demonstrates the value of our technique for the comparison of clustering algorithms. In the second, biologists use our system to investigate why certain strains of mice develop liver disease while others remain healthy, informally showing the efficacy of our system when analyzing multidimensional data containing distinct groups of dimensions.

  • 07.10.2010 / ICG / 16:00h

Title: Regression Forests for Efficient Anatomy Detection and Localization in CT Studies
Speaker: Antonio Criminisi
Abstract: Paper

  • 05.10.2010 / ICG / 13:00h

Title: Image-based Ghostings for Single Layer Occlusions in Augmented Reality
Speaker: Stefanie Zollmann
Abstract: In augmented reality displays, X-Ray visualization techniques make hidden objects visible through combining the physical view with an artificial rendering of the hidden information. An important step in X-Ray visualization is to decide which parts of the physical scene should be kept and which should be replaced by overlays. The combination should provide users with essential perceptual cues to understand the relationship of depth between hidden information and the physical scene. In this paper we present an approach that addresses this decision in unknown environments by analyzing camera images of the physical scene and using the extracted information for occlusion management. Pixels are grouped into perceptually coherent image regions and a set of parameters is determined for each region. The parameters change the X-Ray visualization for either preserving existing structures or generating synthetic structures. Finally, users can customize the overall opacity of foreground regions to adapt the visualization.

Title: The City of Sights: Design, Construction, and Measurement of an Augmented Reality Stage Set
Speaker: Lukas Gruber
Abstract:We describe the design and implementation of a physical and virtual model of an imaginary urban scene—the “City of Sights”— that can serve as a backdrop or “stage” for a variety of Augmented Reality (AR) research. We argue that the AR research community would benefit from such a standard model dataset which can be used for evaluation of such AR topics as tracking systems, modeling, spatial AR, rendering tests, collaborative AR and user interface design. By openly sharing the digital blueprints and assembly instructions for our models, we allow the proposed set to be physically replicable by anyone and permit customization and experimental changes to the stage design which enable comprehensive exploration of algorithms and methods. Furthermore we provide an accompanying rich dataset consisting of video sequences under varying conditions with ground truth camera pose. We employed three different ground truth acquisition methods to support a broad range of use cases. The goal of our design is to enable and improve the replicability and evaluation of future augmented reality research.

  • 21.09.2010 / ICG / 13:00h

Title: Monitoring Social Expectations in Second Life
Speaker: Stephen Cranefield
Abstract:An active topic in multi-agent systems (MAS) research is the adaptation of social constructs from human society, such as reputation, trust, norms and commitments, to enable autonomous software agents gain an awareness of the social context of their interactions and to help preserve order in open societies of agents.

At the same time, human interaction within online virtual communities has become increasingly popular due to the advent of social networking Websites and online virtual worlds. However, while these technologies provide the middleware to enable interaction, they generally provide little support for users to maintain an awareness of the social context of their interactions.

There is therefore an opportunity for techniques developed in MAS research for maintaining social awareness, that were inspired by human society, to be applied in the context of electronically mediated human interaction, as well as in their original context of software agent interaction. This will discuss one application of this idea to the Second Life virtual world. I will describe an approach allowing individual Second Life users or communities to define conditional rules of social expectation and subscribe to a monitor that checks for the fulfilment and violation of these rules.


  • 24.08.2010 / ICG / 13:00h

Title:Content Creation for Augmented Reality on Mobile Devices
Speaker:Stefan Mooslechner
Abstract:Nowadays, AR (Augmented Reality) become more and more attractive for different areas of application. Especially the increasing number of smartphones with huge displays, built-in cameras and fast wireless connections extends this area. The most applications in this case deal with pre-assembled content, and the user is a simple consumer. Indeed it is possible to create their own content, but in most cases, a special knowledge about different software solutions is necessary. So, the user has to invest time in learning about these applications. The possibility to create and share AR content in a smart and easy way would widen up the number of users for this field of application. In this work, we present a prototype to create AR content directly on mobile devices. The user can create new 3D-objects as well as 2D-drawings. We provide different possibilities to color and texture the objects. The building and manipulation of the scenes are done directly at the location where they will be shown. So, a fast and exact adjustment to the environment is possible. We deliver an easy way to build virtual models out of the real environment or to generate totally new objects. Additionally, we use an existing infrastructure to distribute the content to a huge number of users. This could be a further step to reach more acceptance for AR applications by end users.

  • 29.06.2010 / ICG / 13:00h

Title:Information-theoretic database building and querying for augmented reality applications
Speaker:Pawan Baheti
Abstract:Recently, there has been tremendous interest in the area of mobile Augmented Reality (AR) with applications including navigation, social networking, gaming and education. Current generation mobile phones are equipped with camera, GPS and other sensors, e.g., magnetic compass, accelerometer in addition to having ever increasing computing/graphics capabilities and memory storage. Mobile AR applications process the output of one or more sensors to augment the real world view with useful information. In this work the focus is on the camera sensor output, and a server-client framework is introduced to enable AR applications on mobile phones. The main focus of this talk is to present information-theoretic techniques for the server to build and maintain an image (feature) database based on reference images, and for the client to query the captured input images against this database. The database building on the server involves pruning the descriptor set obtained from reference images with respect to statistical and entropy based measures. Performance results using standard image sets are provided demonstrating superior recognition performance even with dramatic reductions in feature database size. Further extensions in terms of client feedback are considered to improve the database optimization on the server side.

  • 15.06.2010 / ICG / 13:00h

Title:Robust Aerial Image Matching in Temporal Variant Regions
Speaker:Gernot Margreitner
Abstract:This thesis deals with the problem of finding correspondences between images that capture the same image scene taken at different times. The time differences can vary from a few minutes up to several months which makes it even harder to reliably find correspondences. Basically, local features proved to be a powerful way to find such correspondences because they are robust to background clutter, occlusions, or changes of the viewpoint. Even though numerous comprehensive feature evaluations have been published, none of these works focused the performance evaluation in the presence of temporal variations. This is a major drawback because in many applications multi-temporal image matching is a crucial component in order to successfully solve the posed problem. Consequentially, this thesis presents a multi-temporal performance evaluation of selected local detectors and descriptors for non-planar aerial imagery. The primary goal of this work is to develop a temporal insensitive image matching workflow that is robust to temporal changes in aerial imagery and achieves highly accurate correspondence alignments. Such a matching algorithm may serve as a fundamental component of a broad range of applications. For example, the demonstrated algorithm prototype can be used to enhance existing photogrammetric workflows, where manually intensive user intervention is usually required in order to correctly match images in the presence of temporal changes.

  • 01.06.2010 / ICG / 13:00h

Title: Describing Buildings by 3-Dimensional Details Found in Aerial Photography
Speaker: Philipp Meixner
Abstract:A description of Real Properties is of interest in connection with Location-Based Services and urban resource management. The advent of Internet-maps and location aware Web-search inspires the development of such descriptions to be developed automatically and at very little incremental cost from aerial photography and its associated data products. Very important on each real property are its buildings. We describe how one can recognize and reconstruct buildings in 3 dimensions with the purpose of extracting the building size, its footprint, the number of floors, the roof shapes, the number of windows, the existence or absence of balconies. A key to success in this task is the availability of aerial photography at a greater overlap than has been customary in traditional photogrammetry, as well as a Ground Sampling Distance GSD exceeding the traditional values. We use images at a pixel size of 10 cm and with an overlap of 80% in the direction of flight and 60% across the flight direction. Such data support a robust determination of the number of floors and windows. Initial tests with data from the core of the City of Graz (Austria) produced an accuracy of 90% regarding the count of the number of floors and an accuracy of 80% regarding the detection of windows.

Title:Highly accurate Multiresolution Isosurface Rendering using compactly supported Spline Wavelets
Speaker:Markus Steinberger
Abstract:We present an interactive rendering method for isosurfaces in a voxel grid. The underlying trivariate function is represented as a spline wavelet hierarchy, which allows for adaptive (view-dependent) selection of the desired level-of-detail by superimposing appropriately weighted basis functions. Different root finding techniques are compared with respect to their precision and efficiency. Both wavelet reconstruction and root finding are implemented in Cuda to utilize the high computational performance of Nvidia's hardware and to obtain high quality results. We tested our methods with datasets of up to 512³ voxels and demonstrate interactive frame rates for a viewport size of up to 1024x768 pixels.

  • 25.05.2010 / ICG / 13:00h

Title: Bridging the gap between 3D computer graphics and video/film
Speaker: Philippe Bekaert
Abstract: 3D computer graphics often looks unnatural and cartoon like. Although it is very well possible to create highly realistic computer graphics models and renderings, the cost of doing so is generally very large. Video technology on the other hand preserves realism from scene to screen, by construction, but it does not allow the fantastic freedom of computer graphics in creating or modifying models, or to navigate and interact in them. In this presentation, I will (re)discuss image and video based modeling and rendering in this light, discuss the particular case of the maturing technology of omni-directional video including its application in performance art, and will argue that we are only at the beginning of the development of a new visual medium with its proper grammar and applications.

  • 05.05.2010 / ICG / 16:00h

Title: Quantitative lung image analysis
Speaker: Reinhard Beichel
Abstract: Lung diseases are a major health problem. State-of-the-art volumetric imaging modalities like multi-detector computed tomography (MDCT) allow us to depict lung diseases in unprecedented detail, which enables us to use imaging as a biomarker. In the first part of the talk, the current challenges in quantitative lung image analysis will be discussed. In the second part, methods for lung shape analysis and robust segmentation of diseased lungs will be presented.

  • 29.04.2010 / ICG / 13:45h

Title: Multi-Frame Rate Volume Rendering
Speaker: Stefan Hauswieser
Abstract: This is the test talk of my paper for the Eurographics Symposium on Parallel Graphics and Visualization. It presents multi-frame rate volume rendering, an asynchronous approach to parallel volume rendering. The workload is distributed over multiple GPUs in such a way that the main display device can provide high frame rates and little latency to user input, while one or multiple backend GPUs asynchronously provide new views. The latency artifacts inherent to such a solution are minimized by forward image warping. Volume rendering, especially in medical applications, often involves the visualization of transparent objects. Former multi-frame rate rendering systems addressed this poorly, because an intermediate representation consisting of a single surface lacks the ability to preserve motion parallax. The combination of volume raycasting with feature peeling yields an image-based representation that is simultaneously suitable for high quality reconstruction and for fast rendering of transparent datasets. Moreover, novel methods for trading excess speed for visual quality are introduced, and strategies for balancing quality versus speed during runtime are described. A performance evaluation section provides details on possible application scenarios.

  • 27.04.2010 / ICG / 13:00h

Title: MCMC Sampling for urban scene analysis
Speaker: Florent Lafarge
Abstract: This talk presents a family of probabilistic tools, the Markov Chain Monte Carlo (MCMC) samplers, which is efficient at minimizing non-convex energies in spaces of high dimension. These optimization algorithms have several interesting properties. For example, they allow us to deal with energies of any form which, in turn, enables us to introduce complex interactions between the objects of interest. They are also adapted at exploring large configuration spaces of variable dimension.

In this talk, we first detail the principle of these samplers and present the various possibilities offered such as the birth and death of objects in a scene, the switching of object types from a library of models, and coupling with diffusion dynamics. We then propose some applications for urban scene analysis. In particular, we present models for representing natural textures, extracting objects of interest from aerial images such as road networks, reconstructing buildings from DEMs, and modelling facades from multi-view stereo images.

  • 23.04.2010 / ICG / 13:00h

Title: Learning Object Detectors from Multiple Cameras by Centralized Information Fusion
Speaker: Armin Berger
Abstract: Automated object detection is an important task in computer vision and visual surveillance in particular. It is a difficult task to train accurate detectors that have a high performance on a wide variety of scenes. For this purpose, recently, in surveillance multi-camera networks attracted interest for training scene specific detectors to improve the detection performance and decrease the false positive rate, since there are many problems that cannot be tackled with single camera approaches (e.g. occlusion handling).

This thesis introduces a novel centralized approach to simplify information fusion within a multi-camera network by learning an object detectors from multiple cameras. This approach allows to collect information form an arbitrary number of cameras. Having calibrated cameras, where the calibration has to be performed only once for each camera, the centralized approach projects each camera's detection information to a central (virtual) camera. A mean-shift algorithm extracts local maxima from the fused information. This location information is back-projected to the single camera views to extract additional examples for training. The approach is demonstrated for the task of person detection within an on-line boosting framework. A detailed analysis of the learning behavior is given and it is shown that the performance of state-of-the-art detectors can be achieved on single camera views although only a small number of labeled training examples are used.

  • 13.04.2010 / ICG / 13:00h

Title: Planar Features for Visual SLAM
Speaker: Tobias Pietzsch
Abstract: In simultaneous localisation and mapping (SLAM), we are concerned with estimating the pose of a mobile robot and simultaneously building a map of the environment it is navigating. Visual SLAM, tackling the problem using a camera as the only sensor, has made astonishing progress in recent years, regarding both scalability and robustness of the devised solutions. The majority of existing systems focus on building sparse maps of point features to enable reliable camera pose tracking. However, the usefulness of sparse maps is limited for many other interesting scenarios. Examples include path planning in robotics or occlusion of artificial objects in augmented reality. These tasks would require maps representing dense structure which allow geometric reasoning. I argue that in order to achieve denser maps, we should go beyond point features to more descriptive features such as line or surface segments.

In this talk, I discuss the application of planar surface segments as features in EKF-based visual SLAM. These planar features are measured directly using the intensities of individual pixels in the camera images. In this way, the information provided by changes in feature appearance due to changing view-point is directly used to improve the state estimate. I will discuss several issues that arise from using intensity measurements and propose solutions. In particular I will address the cubic cost of the EKF update step in the dimension of the measurement vector, because planar feature measurements usually comprise thousands of individual pixel intensities. Finally, I will present experimental results that show robust camera tracking using planar features and increased accuracy in comparison to traditional point features.

  • 30.03.2010 / ICG / 13:00h

Title: Rapid 3D modeling from live video
Speaker: Qi Pan
Abstract: ProFORMA is a system capable of real-time 3D reconstruction of textured objects rotated by a user's hand. Partial models are rapidly generated from the live video and displayed to the user, as well as used by the system to robustly track the object's motion. The system works by calculating the Delaunay tetrahedralisation of a point cloud obtained from on-line structure from motion estimation which is then carved using a recursive and probabilistic algorithm to rapidly obtain the surface mesh. This talk will look at the techniques used in the system as well as future work we plan to conduct in rapid 3D modelling.

  • 23.03.2010 / ICG / 13:00h

Title: Computer-Vision based Pharmaceutical Pill Recognition on Mobile Phones
Speaker: Andreas Hartl
Abstract: In this work we present a mobile computer vision system which simplifies the task of identifying pharmaceutical pills. A single input image of pills on a special marker-based target is processed by an efficient method for object segmentation on structured background. Estimators for the object properties size, shape and color deliver parameters that can be used for querying an on-line database about an unknown pill. A prototype application is constructed using the Studierstube ES framework, which allows to carry out the entire procedure of pill recognition on off-the-shelf mobile hardware. For the purpose of pill retrieval, an additional piece of software is introduced which runs on an ordinary web server. It may deliver preprocessed pill information from an arbitrary database to the mobile device and serves as an interface for arbitrary sources of information. The performance of the estimators as well as their runtime is subsequently evaluated with conditions that resemble typical environments of use. The retrieval performance on the exemplarily used Identa database confirms that the system can facilitate the task of mobile pill recognition in a realistic scenario.

  • 18.03.2010 / ICG / 13:00h

Title: Rigid Body Reconstruction for Motion Analysis of Giant Honeybees Using Stereo Vision
Speaker: Michael Maurer
Abstract: Zoologists are interested in the defense waves of giant honeybees. Especially the movement of all single bees during the defense wave is of interest. Currently they are only able to measure the movement of a single bee using a laser vibrometer. A single measurement does not provide any information on speed, intensity and the starting point of a wave. They are interested in a sensor that enables a 3D reconstruction of the individuals while performing a defense wave. In order to solve this problem, a vision based measurement system is proposed. A portable stereo setup using two high resolution cameras with high frame rates is designed in this thesis to acquire the image sequences of the defense wave in an outdoor environment. The functionality of the acquisition setup has also been proven at an expedition to Nepal. Additionally, a framework to segment and reconstruct the single bees is presented. For the segmentation three different methods are proposed and evaluated. The correspondence problem is faced using reduced graph cuts to get accurate matches in the presence of repetitive patterns. The evaluation has been done by comparison to manually labeled data.

Title: Verteidigungsstrategien bei Riesenhonigbienen
Speaker: Gerald Kastberger
Abstract: Im Rahmen eines vom FWF geförderten Forschungsprojekts werden Kommunikationsleistungen der Riesenhonigbienen (Apis dorsata) untersucht. In diesem Zusammenhang wurde auch eine Kooperation mit der Technischen Universität Graz gesucht. Michael Maurer vom Institute for Computer Graphics and Vision (ICG) hat gemeinsam mit Horst Bischof und Matthias Rüther ein portables Stereo-tracking-System entwickelt. Damit wurde in Chitwan (Nepal) versucht, die an den freihängenden flächenförmigen Einwaben-Nestern eine für Riesenhonigbienen wohl einzigartige Verteidigungsleistung, in drei Raumdimensionen zu vermessen. Dabei handelt es sich um das sogenannte Shimmering, ein kaskadisches Bewegungsmuster der an den Nestern hängenden Oberflächenbienen, das ähnlich den Mexican Waves in Fußballstadien abläuft. Der Wellencharakter und die Schnelligkeit dieses kollektiven Verhaltens haben die Funktion, prädatorische Wespen abzuwehren. Darüber hinaus wird aber auch vermutet, dass Shimmering Kolonie-intrinsische Bedeutung hat, um die Mitglieder der Kolonie über den augenblicklichen Verteidigungsstatus zu informieren. Die Methode des Stereotracking, die von Michael Maurer auf diese Anwendung angepasst wurde, hat den Vorteil, dass die Position der einzelnen Bienen an der Oberfläche des Nests in allen drei Raumdimensionen bei einer Frame-Rate von 60 Hz nicht-invasiv vermessen werden konnte. Damit erschließen sich neue Wege, solch in Bruchteilen einer Sekunde ablaufende kollektive Verhaltensweisen von einem außerordentlich hohen Synchonisierungsgrad zu untersuchen. Im Vortrag möchte ich die einzigartige Biologie der Riesenhonigbienen vorstellen, die feldtaugliche Applikation der erwähnten Versuchsmethode beschreiben und die damit erzielten ersten Resultate kurz darstellen.

  • 09.03.2010 / ICG / 13:00h

Title: GPU-based reconstruction and visualization of needles in X-ray images
Speaker: Matthias Scharrer
Abstract: This work presents an algorithm to detect rigid straight biopsy needles in multi-view C-arm X-ray images and reconstruction of its tip and orientation in three dimensional space. Several well known computer vision techniques are applied to achieve this goal. The processing pipeline described consists of several stages including a denoising and preprocessing stage with image filtering, computing a norm for better distinction of needle and background and applying the Radon transform to determine the orientation, refinement of the latter by utilizing the random sample consensus paradigm, needle tip detection, reconstruction of the data to three dimensional space with Direct Linear Transform and improving robustness by determining deviation in back projection. Afterwards, the results are visualized by a recently developed application, which facilitates the display of volumetric data together with polygonal geometry and intersection thereof. The processing steps are described in detail, a short overview is given about the surrounding application and finally, the evaluation results of the experiments on real X-ray imaging data is presented and discussed.

  • 02.02.2010 / ICG / 13:00h

Title: Novel Applications of Electrocorticographic Signals (ECoG)
Speaker: Peter Brunner

  • 26.01.2010 / ICG / 13:00h

Title: New Approaches to Airway Segmentation in CT Data
Speaker: Christian Bauer
Abstract: In this talk, we present and compare two different methods for the segmentation of airways in CT dataset. The first method utilizes a multi-scale tube detection filter for the identification of tubular objects followed by a reconstruction of the airway tree. During the reconstruction step, prior knowledge about the airway trees is utilized to identify and link tubular objects that are part of the airway tree. This approach enables robust handling of disturbances like tumors or emphysema. The second method utilizes the Gradient Vector Flow (GVF) field for the identification of airways and extraction of their centerlines. The found centerlines are used in a second step to initialize the actual GVF-based segmentation. The performance of both methods has been evaluated on a set of 20 chest CT scans with available reference segmentations. We present the results of this evaluation, discuss properties of the two methods, and compare them to the results of 13 other methods from different research groups.

  • 19.01.2010 / ICG / 13:00h

Title: Simultaneous localisation and mapping for mobile robots with recent sensor technologies
Speaker: Elmar Rueckert
Abstract: Autonomous mobile robots need a map of the environment for navigation. Simultaneous Localisation and Mapping (SLAM) is essential for autonomous navigation, path planning and obstacle avoidance. SLAM describes a process of building a map of an unknown environment and computing at the same time the current robot position. Both steps depend on each other. A good map is necessary to compute the robot position and on the other hand just an accurate position estimate yields to a correct map. Several popular SLAM packages, like DP-SLAM, GMapping or GridSLAM are available for research purposes and allow a not yet available and meaningful comparison between sensors and algorithms. The aim of the work is to find a robust method to generate 2D or 3D maps with recent sensor technologies. We compare a grid based method with two implementations of geometric feature based SLAM algorithms. All methods rely on a probabilistic estimate of the robot state realised with a Particle Filter. Recent sensor technologies: Laser range finders, sonar sensors and time of flight cameras are evaluated with respect to accuracy and robustness. Laser beam based sensors yield to the most exact results and are commonly used. Because of the low price of sonar sensors, ambitious efforts are being made to build cheap household robots. The last sensor technology, listed, is the newest and allows 3D scans of the environment. The experiments take place in indoor environments and a quantitative evaluation of the results is performed with the recently published RawSeeds datasets.

Title: Visual Analytics for Gene Expression Data
Speaker: Bernhard Schlegl
Abstract: The analysis of biomolecular gene expression data sets is a research area which results are used by a wide range of life science experts. Biologists and geneticists are only two sample users who are interested in analyzing gene expression profiles. To support users in terms of visualization, the Caleydo InfoVis framework provides several visualization techniques for gene expression data and pathways. The combination of both data sources is a major field of interest because it provides better insights into the biological processes occurring inside a cell into the context of patients. To help experts extract information from the data visual analytics is sometimes used. In this thesis we apply data mining methods and visualize the results. We set a focus on clustering algorithms because they are well suited for finding co-expressed genes. Co- expressed genes are relevant because experts assume that they are responsible for similar functions inside a cell.

  • 12.01.2010 / ICG / 13:00h

Title: Convex Approximation for Matching Subgraphs in Computer Vision
Speaker: Christian Schellewald
Abstract: I will present a convex approximation approach to the combinatorial problem of matching subgraphs -- which represent object views -- against larger graphs which represent scenes. Starting from a linear programming formulation for computing optimal matchings in bipartite graphs, we extend the linear objective function in order to take into account the relational constraints given by both graphs. The resulting quadratic combinatorial optimisation problem is approximately solved by a (convex) semidefinite program. Some results with respect to view-based object recognition will be shown.

Title: Motion Estimation with Physical Prior Knowledge
Speaker: Annette Stahl
Abstract: We introduce a regularisation term for variational motion estimation approaches exploiting physical prior knowledge that is new in the field of image sequence processing. Using one of the motion estimation approaches along with an appropriate transport process we also propose a new reconstruction approach for missing data in image sequences, also known as video inpainting. We exploit and extend the existing framework of standard variational optical flow approaches, which we use to recover optical flow fields from image sequences by minimising an appropriate energy functional. A partial differential equation is employed in order to obtain a physical plausible regularisation term for dynamic image motion modelling. The resulting distributed-parameter approach incorporates a spatio-temporal regularisation in a recursive online fashion, contrary to previous variational approaches which are designed to evaluate the entire spatio-temporal image volumes in a batch processing mode.

  • 07.01.2010 / ICG / 13:00h

Title: Learning Components for Human Sensing
Speaker: Fernando De la Torre
Abstract: Providing computers with the ability to understand human behavior from sensory data (e.g. video, audio, or wearable sensors) is an essential part of many applications that can benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior. In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g. kernel principal component analysis, support vector machines, and spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks.

In the first part of the talk I will give an overview of several ongoing projects in the CMU Human Sensing Laboratory, including our current work on depression assessment and deception detection from video, as well as hot-flash detection from wearable sensors. In the second part of the talk I will show how several extensions of the CA methods outperform state-of-the-art algorithms in problems such as temporal alignment of human behavior, temporal segmentation/clustering of human activities, joint segmentation and classification of human behavior, and facial feature detection in images. The talk will be adaptive, and I will discuss the topics of major interest to the audience.

  • 22.12.2009 / ICG / 13:00h

Title: Wavelet based real-time deformable objects
Speaker: Antonio Rella
Abstract: Calculation of deformation of 3-D models is a well know realm in analytic mathematics. As early as a century past algorithms have been developed to compute such deformations. In general, these computations for physically correct and accurate deformations are cumbersome and time-consuming. For a few years computers became more powerful and have the capabilities to compute deformations for complex models in an appropriate period of time. Beside these physically correct deformations, realistic and intuitive deformations have been described by less time consuming algorithms, which also could be computed in real-time. However, these deformations are not precise and can only satisfy an observer at first sight. This thesis is concerned with the boundary element computation algorithm for accurate deformation descriptions of three dimensional models. For this method the underlying coefficient matrix is fully populated and therefore more difficult to solve. The approach of lazy wavelets, on the contrary, is able to remove less relevant geometrical information while accepting the emerging error, to achieve a more sparse coefficient matrix and to ease the calculation.

  • 15.12.2009 / ICG / 13:00h

Title:Articulated Tracking With Few and With Many Parameters
Speaker: Prof. Konrad Schindler
Abstract:Estimating the 3D pose of articulated objects from image data is a notoriously difficult problem in computer vision. One reason is that articulated objects such as the human body have many degrees of freedom, and hence a huge pose space. The talk will present two very different ways of simplifying inference in the pose space. In the first approach, strong a-priori assumptions about the expected motion pattern are imposed, which reduce the effective number of unknowns and allow one to perform pose estimation in a space of much lower dimension, found with (non-linear) dimensionality reduction. In the second approach, the relations between parts of the articulated structure are relaxed to soft constraints. This increases the total number of parameters, but allows one to independently estimate the poses of the parts (each with a small number of unknowns) and afterwards enforce the constraints between them by message passing. Examples will be shown of tracking of walking people (for the "few parameters" setting), respectively hand tracking during object manipulation (for the "many parameters" setting).

  • 10.12.2009 / ICG / 13:00h

Title: Efficient Ray Casting of Volumetric Datasets with Polyhedral Boundaries on Manycore GPUs
Speaker: Bernhard Kainz
Abstract: We present a new system for hardware-accelerated ray casting of multiple volumes. Our approach supports a large number of volumes, complex translucent and concave polyhedral objects as well as CSG intersections of volumes and geometry in any combination. It is implemented as a software renderer in CUDA without any fixed function portions, which allows full control over the use of memory bandwidth. High depth complexity, which is problematic for conventional approaches based on depth peeling, can be successfully handled. As far as we know, our approach is the first framework for multi-volume rendering which provides interactive frame rates when concurrently rendering more than 50 arbitrarily overlapping volumes on current graphics hardware.

  • 09.12.2009 / ICG / 13:00h

Title: Graphical Models for Object Detection and Pose Estimation
Speaker: Prof. Stefan Roth
Abstract: In my talk I will present two rather different approaches for object detection. The first is focused on detecting generic classes of objects in images, a problem that is made challenging by the large intra-class appearance variation of typical object categories as well as viewpoint changes and occlusions. Recent work has pointed to advantages of combining several features, particularly global and local ones. But it remained difficult to choose appropriate combinations of features. Our work addresses this problem using a conditional random field (CRF). To find the feature couplings that yield the best discriminative power, we automatically learn the graph structure of the CRF in a discriminative fashion. The resulting approach yields state-of-the-art performance on the challenging PASCAL 2007 dataset.

In the second part of my talk, I will focus on detecting people in particular, as well as estimating their 2D pose. While for people detection monolithic, global features and discriminative learning are widely used, pose estimation has often remained focused on simple image features such as silhouettes that lack discriminative power. Our work shows that combining a simple tree-structured graphical model for modeling admissible body part configurations with powerful discriminative features in a so-called pictorial structures framework enables excellent detection as well as pose estimation performance. We demonstrate results for a variety of challenging scenes including TV footage.

This is joint work with P. Schnitzspan, M. Andriluka and B. Schiele.

  • 01.12.2009 / ICG / 13:00h

Title:Towards a Collaborative Information Visualization System in a Multi-Display Envrionment
Speaker: Werner Puff
Abstract:Visual data analysis often operates on a vast amount of data. The size of data sets as well as the knowledge related to it raises, among others, two problems. On the one hand display areas and resolutions of standard workplaces are often insufficient to visualize the data in a proper proper manner. On the other hand the knowledge and collaboration of experts from multiple domains is needed for an efficient data analysis process but the collaborative tasks are not supported by the utilized systems. This work proposes an approach to counter these problems, based on the information visualization software Caleydo. The approach introduces extensions to the existing software in order to connect multiple Caleydo applications for the collaborative analysis of one data set. In addition, Caleydo is integrated into the Multi-Display Environment Deskotheque. This integration provides larger and more flexible display areas and also enables co-located collaboration for small groups. The extensions presented here include a communication layer to provide synchronization and data exchange between the applications. Visual Links are used to make users aware of changes, an important aspect in a setup consisting of multiple displays and multiple users.

Title:Real-Time 3D Rendering for the Atronic EGD Framework: A Hybrid Approach
Speaker:Christopher Dissauer
Abstract:The Austrian company Atronic is engaged in development and manufacturing of videobased gaming machines and display systems for the world-wide casino market. Displaying graphical content on these machines is performed via the companys proprietary EGD scene graph framework; historically, the original design of this framework was strongly influenced by the comparatively low GPU performance of the available hardware platform, resulting in a necessarily CPU-bound 2D-only implementation. This thesis presents an approach to augment the framework by GPU-accelerated real-time 3D rendering, suitable for operation on Atronics upcoming next-generation hardware platforms. Necessary goals and requirements were defined; during this process, it was decided to aim for a hybrid approach that represents a novel way for seamless integration of 3D functionality into the existing 2D system. Based on these drafts, a preliminary study was carried out to ensure overall technical feasibility within the given hardware and software limits. With respect to the results of this study a concrete design was laid out that eventually led to the implementation of a working prototype. Corner test cases have been prepared to evaluate the hybrid prototype system with regard to extensibility, stability and performance, yielding overall satisfactory results. Nevertheless, certain inadequacies were revealed through this evaluation; most prominently, the techniques used for parallelization of CPU-bound and GPU-bound tasks showed considerable room for further optimization towards a technically mature product.

Document Actions

[Powered by Plone]