Research Projects
- Show Keywords
- 3D Computer Vision 3D reconstruction Aerial Vision Augmented Reality Augmented Video Best Paper Award Biometrics Caleydo Computer Graphics Computer Vision Convex Optimization Coordinate transformations detection face Fingerprint Georeferencing GPU GUI HOG Human Computer Interaction Image Labelling Industrial Applications Information Visualization integral imaging Interaction Interaction Design Machine Learning Medical computer vision Medical Visualization Mixed Reality Mobile computing Mobile phone Model Multi-Display Environments Multiple Perspectives Object detection Object recognition Object reconstruction Object Tracking On-Line Learning Robotics Segmentation Shape analysis shape from focus SLAM Software Projects Structure from Motion Surveillance SVM Symmetry Tracking Fusion Tracking, Action Recognition User Interfaces Variational Methods Virtual reality and augmented reality Visual Tracking Visualization
| Title | Abstract |
Start
|
End |
|---|---|---|---|
|
CONSTRUCT: Construction Site Monitoring and Change Detection using UAVs
(details) |
The goal of the project is to develop methods for modeling and surveying large construction sites. The project will make use of unmanned aerial vehicles and existing stationary or pan-tilt zoom cameras at the construction site. The goal is to provide accurate 3D models on a regular basis of the whole site. This will generate a 4D data set (3D+time). This data can then be used for documentation, visualization (we will use a mobile augmented reality system to overlay e.g. the plan or a model of the building) as well as measurement (e.g., how much material has been transported). From a scientific point of view we will have to solve following tasks:
|
2011 | 2014 |
|
HOLISTIC: Holistic Aerial Scene Understanding Using Highly Redundant Data
(details) |
The aim of this research project is holistic scene understanding in large aerial datasets, consisting of thousands of massively redundant high-resolution images. Holistic scene understanding is one of the major problems in computer vision and photogrammetry and has recently got a lot of attention. The problem of holistic image understanding includes two fundamental tasks: 3D scene reconstruction and semantic interpretation of the imaged content at the level of pixels. The tight interaction between semantic classification and 3D reconstruction is often ignored by state of the art aerial image processing workflows, due to the lack of computational power, the absence of efficient algorithms or the enormous effort of manual intervention. However, these tasks are mutually informative and should be solved jointly as a correct class labelling is a valuable source of information for reconstruction, and 3D information can help to improve the semantic interpretation. For instance, a correct classification is a valuable source of information for reconstruction in regions where dense matching methods fail (e.g. sheets of water and reflecting windows / facades), and 3D information can be used as a prior to improve classification (e.g. building and road detection). The high resolution and redundancy due to large overlaps of aerial images requires massive processing power which will be handled by taking advantage of graphic processing units that have proved to give a significant speedup compared to single core machines. In particular, we will focus on algorithms based on variational methods, which provide a high degree of parallelization capability. In order to reduce cost-intensive manual interaction, we further will exploit publicly available user-data from the Internet to improve both interpretation and 3D reconstruction. In the HOLISTIC project we will provide a flexible framework for scene classification and 3D reconstruction from aerial images that outperforms current state-of-the art and delivers interpretable models at highest possible accuracy. To achieve this goal, we will focus our attention on the following two research subjects: (i) the joint optimization of geometry and semantic classification from aerial images in a unified framework, and (ii) the exploitation of existing geographic information systems and web data to support these two sub-tasks. In addition, we will use web-based standard to efficiently represent the obtained results for fast modeling and data parsing. |
2011 | 2014 |
|
Mobi-Trick
(details) |
The focus of the project is outdoor mobile computer vision with all of its challenges. Mobile systems need to be compact and energy efficient and are frequently changing locations. Therefore they must be autonomous and perform processing locally. A number of challenges arise from these requirements for which the project aims to provide solutions: Being compact, there is not much space for a large number of sensors such as laser scanners, radar antennas and the like. The work in this project will focus on stereo vision but with two different types of cameras. Often a second camera is already available and stereo information increases detection accuracies. Each time the system moves it needs to adapt to the changing situation. This requires adaptive calibration and online learning. Mobile systems often work from batteries. In addition, there is not much space to include intricate cooling systems. Thus, the system must be designed to be very energy efficient. New approaches for dynamic power management will be explored in the project. To put the work into context, several applications from the area of traffic surveillance/toll enforcement will be implemented and tested in an application oriented setting. Current traffic enforcement solutions are either very large and costly (section control, toll enforcement) or do not offer much in terms of image processing (radar speed control). The technological output of Mobi Trick makes it possible to design mobile solutions for traffic monitoring, vehicle identification and classification, intelligent incident detection and observation of driver behavior. Mobile devices are also more efficient in enforcement. Their transient nature makes them less predictable. Mobile systems can also react more flexibly to changing road situations such as construction sites. |
2010 | 2013 |
|
HD-VIP: High Definition Video Processing
(details) |
The growth of information is nowadays enormous and at a level which had never been reached before. We currently produce almost more data in one year than was produced in the entire history of mankind so far. In particular the trend to a full digitization of audiovisual content is contributing to this explosion of available material. The exponential growth of online video, most notably YouTube among the many prominent video portals is just one example for that. Even if international studies are not arriving at exactly the same results, the figures are impressive: digital production in 2006 was approximately 160 Exabyte, and is predicted to rise to 990 Exabyte in 2010. Any video processing /editing software has to keep pace with these extraordinary data rates which requires special efforts from the hardware and the software. Fortunately we see also an extraordinary increase in processing power, especially when looking at recent developments of graphics cards (GPUs). These cards offer massive parallelism (ideally suited for video processing) at a rather modest price. All these facts make this hardware an ideal candidate for video processing. But in order to make full use of the hardware the algorithms have to be highly parallel. Typical tasks encountered in video processing (which will also be tackled by the proposed project are): Superresolution: With the advent of HDTVs in many homes there is an increasing need to produce also HDTV content. In order to make use of existing (low-resolution) material one can use so called superresolution algorithms. These methods generate from a sequence of low resolution frames a high resolution image by exploiting the high interframe redundancy. Denoising: There are many sources of noise in a video, either the material is historic or during production/compression etc. noise is added to the video. A basic task is to remove the noise but still preserve all fine scale details. Interactive video editing: For post production purposes one wants to mark objects in a video (of course the object should only be marked in a single frame and then segmented automatically in all subsequent frames) and either remove them (which requires inpainting methods to fill the holes with meaningful content), place them somewhere else in the video or replace them with different objects. Since these tasks are done interactively this requires interactive framerates. Fortunately all of these tasks can be addressed by so called variational methods. The basic idea is to formulate the task as a minimization problem of a suitable energy functional. Besides other desirable properties these methods can be implemented in a highly parallel fashion which makes them ideal candidates for implementation on modern GPUs. |
2010 | 2012 |
|
OUTLIER
(details) |
The ever increasing number of cameras in surveillance system requires automatic video analysis in order to spot critical situations and to alert the monitoring personnel in a timely manner. While most current approaches in this area aim for detecting a large number of specific events on a large set of complex application scenarios, the goal of this project is to go far beyond state of the art by developing novel online learning methods to detect unusual situations in a camera specific scenario. We will exploit the huge amount of data available for a specific camera to reliably learn usual and unusual situations. In particular the OUTLIER project will carry out basic research in the following areas:
These generic learning algorithms will be applied for the detection of unusual situations in public places and traffic scenarios. Examples are the detection of unusual crowd behavior (upcoming panic, barred emergency exits, or toppled persons), suspicious behavior of pedestrians (e.g. going from one car to another, loitering), vehicles or persons moving on unusual locations, the detection of unusual types of moving objects and detection of unusual situations like accidents, clashes and collisions. Unlike other approaches we do not want to model these situations explicitly and individually, but we will resort to learning to discriminate the usual situation from the unusual one. Research partners in the project are JRS, TUG for basic and applied research and Siemens for industrial exploitation of project results. |
2009 | 2011 |
|
Multimedia Documentation Lab
(details) |
The potential for integration of multimedia content into the analysis of security relevant affairs is researched for the first time within the scope of Austrian security research efforts. The project’s goal is to harvest audio-visual information from specified open multimedia sources such as TV broadcasts and allow for integration into existing environments at user sites. The intended use of the system is to allow experts to efficiently generate more realistic and high-quality situation reports in the face of critical situations. Subsequently, these can be employed for communication with the population of Austria and to increase its security and sense of security - target goals of the KIRAS framework. An exemplary implementation of a prototype will be installed at the Zentraldokumentation of the Austrian Armed Forces. In terms of audio-processing the project builds upon existing technologies of the industrial partner, while the visual processing is investigated by ICG as academic partner and will mainly deal with person/face detection, tracking and recognition methods. |
2009 | 2011 |
|
Ludwig Boltzmann Institut für Klinisch-Forensische Bildgebung
(details) |
Die klinische Rechtsmedizin gewann in den letzten Jahren aufgrund einer Sensibilisierung der Öffentlichkeit gegenüber häuslicher und sexueller Gewalt, Gewalt gegenüber Kindern und Verdachtsfällen von medizinischen Behandlungsfehlern stark an Bedeutung. Die forensische Untersuchung von Lebenden ist bis heute jedoch auf eine äussere Besichtigung des Körpers beschränkt. Das neue Ludwig-Boltzmann-Institut (LBI) für klinisch-forensische Bildgebung hat zum Ziel, Verfahren zur Erfassung von inneren Verletzungsbefunden als Grundlage für forensische Gutachten zu entwickeln. Mittels Computertomographie (CT) und Magnetresonanztomographie (MRT), welche in der Klinik etabliert sind, können zusätzliche, objektiv nachweisbare innere Verletzungsbefunde erhoben werden, die eine verbesserte Einschätzung der ausgeübten Gewalt gegen die untersuchte Person ermöglichen. Die Methoden sind jedoch auf klinische Diagnostik ausgerichtet, während forensisch wichtige Befunde nicht oder nicht optimal dargestellt werden. Das Institut fuer Maschinelles Sehen und Darstellen kooperiert mit dem LBI zur Entwicklung neuer Methoden der Bildverarbeitung und Computergrafik zum Zwecke der Bildgebung. |
2008 | 2015 |
|
CityFit: High-Quality Urban Reconstructions by Fitting Shape Grammars to Images and derived Textured Point Clouds
(details) |
The generation of realistic 3D models of whole cities has become a vibrant and highly competitive market through the recent activities of, most notably, Goggle Earth and Microsoft Virtual Earth. While the first generation of these systems only delivered high-quality zoomable images of the ground, the current trend is heavily geared towards 3D – that is, users can access three-dimensional height- fields of the terrain, and even 3D models of individual buildings. Simple building models, basically extruded polygons with different types of roofs, can be generated today from aerial images completely automatically. This is a solved problem. Far from solved, however, is the problem of generating automatically detailed buildings with façades. Input data for this problem are registered range maps obtained by stereo matching and sequences of highly overlapping thus redundant images (taken from a car driving in the road) where each pixel has not only a color but also a depth, a z-value. Although range maps can be directly rendered in principle, the data size is huge and, more importantly, the pixels have no semantics: A priori there is no difference between a pixel on the floor, on the wall, or on a door. But these shape semantics are required by all downstream applications using the city model. Shape grammars, on the other hand, have recently become (again) a popular method in research for representing 3D buildings. Their great advantage is that they allow to parameterize buildings, which can be used for populating virtual cities with believable architectural buildings, e.g., for 3D games. The goal of the CITYFIT project is, given highly redundant input imagery and range maps from an arbitrary building in Graz, to synthesize a shape grammar that, when evaluated, creates a clean, CAD- quality reconstruction of that building that fits the original data very closely and makes the semantics of all major architectural features explicit. These shape semantics can even be transferred back to inform the original data, so each of these “semantically enriched” data points can tell whether it belongs to ground, wall, or door. |
2008 | 2010 |
|
vdQA: Video Quality Analysis
(details) |
Automatic and efficient quality analysis of audiovisual content has become a crucial step before storing the material for later use. While most approaches in this area are only dealing with low level signal analysis, the goal of this project is to go far beyond state-of-the-art procedures. On the basis of novel as well as proven computer vision methods, we will attempt to incorporate high level knowledge in the analysis step, thus achieving significant better and faster results than current methods, comparable in their reliability with a human operator. In particular the vdQA project will carry out research in the following areas: • Improvement of optical flow field methodologies to deal with multi-frame information • Application of novel segmentation methods in order to enable semantic quality analysis. • Knowledge assisted artefact assessment and classification. • Novel methods for fast and robust detection of difficult impairments like unsteadiness, flicker, freeze frames, test patterns and lost frames. • Research into methodologies that are particularly well suited for implementations taking advantage of GPU hardware. The grand challenge in the end is the combination of robustness, speed and integration of human knowledge. The research and industrial partners have dedicated roles in the work programme to achieve those goals. The industrial partners have excellent knowledge of the market and will provide user requirements as well as extensive test material. The academic partners will do research in their respective fields, namely development of basic algorithms for optical flow, tracking, segmentation, classification and usage of GPUs as well as algorithms for content based quality analysis and semantic technologies to represent knowledge. Towards the project end the industrial partners will evaluate and test the developments together with pilot end users. Project Partners: Joanneum Research, Institute of Information Systems & Information Management |
2008 | 2010 |
|
Doctoral Program for the Confluence of Graphics and Vision
(details) |
Computer vision and computer graphics constitute two closely related areas of research: Though both fields rely on the same physical and mathematical principles and on a common set of representations, they mainly differ in how these representations are built. Traditionally these two fields have been treated as separate academic discipline. Exploiting the commonalities between vision and graphics turns out to be a scientifically profitable endeavour. There are many examples of fruitfull combination of graphics and vision, but there is no systematic education of students (especially in Austria). Therefore, the goal of this doctoral program Confluence of Vision and Graphics is to educate highly talented PhD students in this interdisciplinary field and to teach them a common view of this challenging topic from the start. All proposed topics require a significant amount of vision and graphics. The students will be co-supervised jointly by one professor with vision and one professor with graphics expertise. The proposed educational program will ensure that the students will be trained to become future leading scientists, which will face the challenges of research excellence in the interdisciplinary area of graphics and vision, academic leadership, and social competence as a member of a particular research group as well as being a part of the global research network. |
2007 | 2019 |
|
EVis: Autonomous Traffic Monitoring by Embedded Vision
(details) |
The world will witness a tremendous increase in the number of vehicles in the near future. Future traffic monitoring systems will therefore play an important role to improve the throughput and safety of roads. Current monitoring systems capture (usually vision-based) traffic data from a large sensory network; however, they require continuous human supervision which is extremely expensive. In the proposed EVis research project we investigate the scientific and technological foundations for future autonomous traffic monitoring systems. Autonomy is achieved by a novel combination of three approaches: First, vision-based detection and classification methods are augmented by self-learning and scene adaptation mechanisms which will significantly reduce the effort of manual configuration. Second, visual data is fused with data from other sensors such as radar, infrared or inductive loop sensors. Sensor fusion helps to improve the robustness and confidence, to extend the spatial and temporal coverage as well as to reduce the ambiguity and uncertainty of the processed sensor data. Finally, the developed vision and fusion methods are implemented on a distributed embedded platform which makes them wider applicable and supports real-time operation. Our autonomous traffic monitoring system will be evaluated using real world traffic data. The evaluation will be conducted in three different case studies: offline testing using recorded data, online testing on a traffic test site, and on a test installation on a public road. |
2007 | 2010 |
|
A Low-Cost System for Automatic People Tracking in a Labyrinth
(details) |
After medical treatment of visually handicapped people it is desirable to evaluate the benefit of the treatment for the patient. Especially the capability of the patient to orient himself in a three-dimensional environment, to navigate and recognize obstacles is of interest. For a clinical evaluation under controlled circumstances a labyrinth has been built through which the patient ha to navigate. Obstacles may be randomly placed in the labyrinth. A multi-camera system keeps track of the patients movements and extracts parameters such as position, speed, head rotation etc. |
2006 | 2007 |
|
Computer Vision Methods for the Automatic Analysis of Fibrous Structures in Biological Soft
(details) |
Soft tissue like tendons, arteries, veins or skins are important biological materials. A greater understanding of the foundations and interactions of structure and function of soft tissue, and, in particular, the associated mechanobiology is of great interest in the field. A thorough understanding of the complex interrelations between mechanical factors and the associated biological responses may help to improve diagnostics which allow disease and injury to be treated earlier. The research proposed here will develop a fully automatic system for analyzing macroscopic structures obtained from histological images of arteries by means of modern computer vision techniques. Besides being interesting from the mechanobiological point of view the structural analysis of images of collagen fibers poses also several challenging questions from a computer vision point of view. In particular, due to the wide variety of different appearances of collagen fibers in images this task is non trivial. The main task of this research is the development of novel segmentation techniques for robustly segmenting individual fibril bundles and estimating their parameters, like location and shape, fibril density, mean fibril orientation, wriggling of fibrils etc. This will be achieved by developing novel perceptual grouping methods operating on the extracted orientation data of fibrils. Another major challenge of this research is to extend the structural analysis from 2D to 3D. |
2003 | 2005 |
|
FSP/JRP Cognitive Vision
(details) |
We envision a scenario in which every person will interact in a natural way with artificial devices as an aid in daily life situations such as orientation, search and information retrieval. We refer to this long-term vision as the Personal Assistance (PA) scenario, where a combination of mobile devices and distributed ambient spaces unobtrusively support users by being aware of the present situation and by responding to user requests. Subprojects at ICG: |
2003 | 2009 |
|
CONEX
(details) |
Robust and Adaptive Approaches to Scene and Object Recognition: The goal of this joint project is to investigate new robust and adaptive approaches in the area of object and scene recognition. Object and scene recognition is a necessary requirement for developing truly cognitive systems as well as for the development of advanced and novel multimodal interfaces leading to ambient intelligence. Having a robust object and scene recognition system the following applications will greatly benefit: novel user interfaces which understand human activities, intelligent surveillance, indexing multi-media databases and content analysis of images, autonomous mobile systems and robotics, industrial inspection and robotics, etc. The goal is to develop computer vision based systems that can recognize objects, and in the context of environment perform localization and navigation. The major challenge is to develop systems and methods that can work under realistic unconstrained conditions (i.e., outside the lab). The three partners proposing this project (Center for Machine Perception, Czech Technical University Prague, CMP, Computer Vision Lab, Faculty of Computer and Information Science, University of Ljubljana, CVL, and Institute for Computer Graphics and Vision, Graz University of Technology ICG) have considerable expertise in this area and developed complementary methods and techniques. The goal of the project is to join the efforts and combine the expertise. In particular, we do the following activities:
|
2003 | 2005 |
|
Entrance Surveillance
(details) |
The aim of this project is the development of an imaging system for an industrial partner. The developed system should survey entrances using video images and register the people passing the entrance. The system has to operate under outdoor conditions (sun light, fog, etc.). |
2002 | 2002 |
|
Uncalibrated Euclidean Scene Reconstruction in Scanning Electron Microscopy Using the Trifocal Tensor
(details) |
The scanning electron microscope (SEM) is an important tool to examine very small structures. Its large magnification combined with good contrast and large depth of view make it possible to view and characterize microscopic structures in the sub-micron scale. In the recent years, the problem of dense surface reconstruction from multiple SEM images was a research topic on this institute. Reconstruction approaches like shape from stereo and shape from photometric stereo have been evaluated. This work presents a framework for automatic scene reconstruction from three images acquired by a scanning electron microscope. The basic assumption is that the specimen is tilted eucentrically in front of the camera, camera geometry is assumed to be unknown but constant over all views. It is shown that methods for estimating the trifocal tensor as well as modern auto-calibration approaches can be adapted to the imaging conditions in the SEM, and Euclidean scene structure can be retrieved from three uncalibrated views. The performance of the proposed framework is evaluated on synthetic data as well as real images. It is shown that Euclidean scene structure can be retrieved robustly under varying image noise and inaccurate initialization. |
2002 | 2003 |
