Research Projects
- Show Keywords
- 3D Computer Vision 3D reconstruction Aerial Vision Augmented Reality Augmented Video Best Paper Award Biometrics Caleydo Computer Graphics Computer Vision Convex Optimization Coordinate transformations detection face Fingerprint Georeferencing GPU GUI HOG Human Computer Interaction Image Labelling Industrial Applications Information Visualization integral imaging Interaction Interaction Design Machine Learning Medical computer vision Medical Visualization Mixed Reality Mobile computing Mobile phone Model Multi-Display Environments Multiple Perspectives Object detection Object recognition Object reconstruction Object Tracking On-Line Learning Robotics Segmentation Shape analysis shape from focus SLAM Software Projects Structure from Motion Surveillance SVM Symmetry Tracking Fusion Tracking, Action Recognition User Interfaces Variational Methods Virtual reality and augmented reality Visual Tracking Visualization
| Title | Abstract |
Start
|
End |
|---|---|---|---|
|
CONSTRUCT: Construction Site Monitoring and Change Detection using UAVs
(details) |
The goal of the project is to develop methods for modeling and surveying large construction sites. The project will make use of unmanned aerial vehicles and existing stationary or pan-tilt zoom cameras at the construction site. The goal is to provide accurate 3D models on a regular basis of the whole site. This will generate a 4D data set (3D+time). This data can then be used for documentation, visualization (we will use a mobile augmented reality system to overlay e.g. the plan or a model of the building) as well as measurement (e.g., how much material has been transported). From a scientific point of view we will have to solve following tasks:
|
2011 | 2014 |
|
HOLISTIC: Holistic Aerial Scene Understanding Using Highly Redundant Data
(details) |
The aim of this research project is holistic scene understanding in large aerial datasets, consisting of thousands of massively redundant high-resolution images. Holistic scene understanding is one of the major problems in computer vision and photogrammetry and has recently got a lot of attention. The problem of holistic image understanding includes two fundamental tasks: 3D scene reconstruction and semantic interpretation of the imaged content at the level of pixels. The tight interaction between semantic classification and 3D reconstruction is often ignored by state of the art aerial image processing workflows, due to the lack of computational power, the absence of efficient algorithms or the enormous effort of manual intervention. However, these tasks are mutually informative and should be solved jointly as a correct class labelling is a valuable source of information for reconstruction, and 3D information can help to improve the semantic interpretation. For instance, a correct classification is a valuable source of information for reconstruction in regions where dense matching methods fail (e.g. sheets of water and reflecting windows / facades), and 3D information can be used as a prior to improve classification (e.g. building and road detection). The high resolution and redundancy due to large overlaps of aerial images requires massive processing power which will be handled by taking advantage of graphic processing units that have proved to give a significant speedup compared to single core machines. In particular, we will focus on algorithms based on variational methods, which provide a high degree of parallelization capability. In order to reduce cost-intensive manual interaction, we further will exploit publicly available user-data from the Internet to improve both interpretation and 3D reconstruction. In the HOLISTIC project we will provide a flexible framework for scene classification and 3D reconstruction from aerial images that outperforms current state-of-the art and delivers interpretable models at highest possible accuracy. To achieve this goal, we will focus our attention on the following two research subjects: (i) the joint optimization of geometry and semantic classification from aerial images in a unified framework, and (ii) the exploitation of existing geographic information systems and web data to support these two sub-tasks. In addition, we will use web-based standard to efficiently represent the obtained results for fast modeling and data parsing. |
2011 | 2014 |
|
Managed Volume Processing (MVP)
(details) |
Volumetric data is very common in medicine, geology or engineering, but the high complexity in data and algorithms has prevented widespread use of volume graphics. Recently, however, 3D image processing and visualization algorithms have been parallelized and ported to graphics processing units (GPUs). This proposal is concerned with new ways of designing volume graphics algorithms for the GPU that can interactively cope with these huge problems by better utilization of GPU capacity. Unfortunately, only certain parts of common image or volume processing algorithms can be mapped to the standard GPU stream processing model. For most real-world problems, writing programs for this architecture is a tedious task. As a result, most algorithms use the available processing power only for small subtasks -- the number crunching in inner loops. For example, direct volume rendering (DVR) methods send rays into a volumetric object, accumulate intensities, divide rays into sub-rays, scatter rays in materials and/or extract certain features. All GPU implementations of DVR use one processing unit for one pixel, regardless of whether the pixel will require very complex calculations or not. This strategy frequently leads to strong load imbalances. A particular problem of interactive applications such as volume graphics is that they are not traditional number crunching tasks, which only require optimal computational throughput, while having relaxed or no constraints concerning latency. On the contrary, interactive applications demand meeting real-time deadlines to ensure interactive response. This is a classical real-time resource scheduling problem. It can only be achieved by adaptive algorithms that rely on complex flow control and memory management decisions during the parallel execution. Both is currently only available on the CPU, which allows access to privileged mode through the operating system. On the GPU, components for high level scheduling involving latency hiding and memory management are missing or inaccessible. The desired full utilization of the GPU is very difficult to achieve for complex graphics algorithms with real-time demands. Building a toolset that allows harvesting the full GPU power for a general class of real-time volume graphics algorithms is the main goal of this proposal. We propose a managed volume processing system that incorporates the missing components. Its key modules are a task model, a workload scheduler with real-time capabilities and a virtual memory management system executed in tandem on the GPU and CPU. We will rely on the most recent hardware developments and use OpenCL as the standardized interface to access them. | 2011 | 2014 |
|
AR4DOC - Augmented Reality for Document Inspection
(details) |
Smartphones have evolved considerably in processing power over the last years. They now feature multi-core CPUs as well as GPUs and consumer-quality cameras up to HD resolution. This makes them an interesting platform for graphics and vision and opens new opportunities for research. The aim of AR4DOC is to facilitate the task of document inspection by a human operator. This requires the person to have detailed knowledge about the nature of a document, which may be outdated or even unavailable at the time of inspection. We seek to provide this information in an interactive way using Mobile Augmented Reality (AR), so that a well-grounded decision on the vailidity of a document is possible. This involves several tasks such as document localization, recognition, tracking, presentation as well as interaction.
|
2010 | 2013 |
|
Mobi-Trick
(details) |
The focus of the project is outdoor mobile computer vision with all of its challenges. Mobile systems need to be compact and energy efficient and are frequently changing locations. Therefore they must be autonomous and perform processing locally. A number of challenges arise from these requirements for which the project aims to provide solutions: Being compact, there is not much space for a large number of sensors such as laser scanners, radar antennas and the like. The work in this project will focus on stereo vision but with two different types of cameras. Often a second camera is already available and stereo information increases detection accuracies. Each time the system moves it needs to adapt to the changing situation. This requires adaptive calibration and online learning. Mobile systems often work from batteries. In addition, there is not much space to include intricate cooling systems. Thus, the system must be designed to be very energy efficient. New approaches for dynamic power management will be explored in the project. To put the work into context, several applications from the area of traffic surveillance/toll enforcement will be implemented and tested in an application oriented setting. Current traffic enforcement solutions are either very large and costly (section control, toll enforcement) or do not offer much in terms of image processing (radar speed control). The technological output of Mobi Trick makes it possible to design mobile solutions for traffic monitoring, vehicle identification and classification, intelligent incident detection and observation of driver behavior. Mobile devices are also more efficient in enforcement. Their transient nature makes them less predictable. Mobile systems can also react more flexibly to changing road situations such as construction sites. |
2010 | 2013 |
|
HD-VIP: High Definition Video Processing
(details) |
The growth of information is nowadays enormous and at a level which had never been reached before. We currently produce almost more data in one year than was produced in the entire history of mankind so far. In particular the trend to a full digitization of audiovisual content is contributing to this explosion of available material. The exponential growth of online video, most notably YouTube among the many prominent video portals is just one example for that. Even if international studies are not arriving at exactly the same results, the figures are impressive: digital production in 2006 was approximately 160 Exabyte, and is predicted to rise to 990 Exabyte in 2010. Any video processing /editing software has to keep pace with these extraordinary data rates which requires special efforts from the hardware and the software. Fortunately we see also an extraordinary increase in processing power, especially when looking at recent developments of graphics cards (GPUs). These cards offer massive parallelism (ideally suited for video processing) at a rather modest price. All these facts make this hardware an ideal candidate for video processing. But in order to make full use of the hardware the algorithms have to be highly parallel. Typical tasks encountered in video processing (which will also be tackled by the proposed project are): Superresolution: With the advent of HDTVs in many homes there is an increasing need to produce also HDTV content. In order to make use of existing (low-resolution) material one can use so called superresolution algorithms. These methods generate from a sequence of low resolution frames a high resolution image by exploiting the high interframe redundancy. Denoising: There are many sources of noise in a video, either the material is historic or during production/compression etc. noise is added to the video. A basic task is to remove the noise but still preserve all fine scale details. Interactive video editing: For post production purposes one wants to mark objects in a video (of course the object should only be marked in a single frame and then segmented automatically in all subsequent frames) and either remove them (which requires inpainting methods to fill the holes with meaningful content), place them somewhere else in the video or replace them with different objects. Since these tasks are done interactively this requires interactive framerates. Fortunately all of these tasks can be addressed by so called variational methods. The basic idea is to formulate the task as a minimization problem of a suitable energy functional. Besides other desirable properties these methods can be implemented in a highly parallel fashion which makes them ideal candidates for implementation on modern GPUs. |
2010 | 2012 |
|
Higher Order Variational Methods
(details) |
This research project is devoted to the study of higher order convex variational methods for problems in computer vision. First order methods, i.e. methods which take into account first order derivatives have shown a great success for a variety of inverse computer vision problems. This success is mostly due to the introduction of total variation methods by Rudin, Osher and Fatemi in 1992. Total variation methods exhibit the important property to preserve sharp discontinuities in the solution while the associated optimization problem is still convex. This leads to robust problem solutions, independent of any initialization. Besides this, total variation methods also exhibit some disadvantages. First, total variation methods favor piecewise constant solutions which leads to staircaising artifacts in image restoration problems and to the preference of fronto‐parallel structures in stereo problems. Second, total variation methods introduce a shrinking bias in shape optimization problems. The aim of this project is therefore to study higher order convex variational methods in order to improve the shortcomings of first order methods. We therefore propose to investigate two approaches. The first approach is based on the so‐called generalized total variation method, recently introduced by Bredies, Kunisch and Pock. It provides a framework to recover piecewise polynomial functions based on a convex functional. We expect that this method leads to significant improvements of stereo and motion estimation problems. The second approach is based on the so‐called roto‐translation space introduced by Citti and Sarti in 2006. It allows to rewrite functionals incorporating curvature regularity by means of a convex first order functional in higher dimensions. We expect that this approach will significantly improve the performance of various shape optimization problems. |
2010 | 2013 |
|
Highly accurate range computation in driver assistence systems
(details) |
In this project we study variational methods for computing highly accurate range data in driver assistance systems. |
2010 | 2011 |
|
Image Processing and Statistical Learning
(details) |
The goal of this project is to study statistical learning methods in particular boosting and random forest for computer vision tasks. We are especially interested in on-line learning. |
2009 | 2010 |
|
KIRAS - SECRET
(details) |
Different authorities like such as the Ministry of the Interior often require to find certain event or behavior patterns in recordings in large video archives. This "forensic" search is computationally extremely expensive and due to restricted storage permissions often even not possible. Thus, security-critical events can often not prevented or being postpursued. To overcome these problems, the aim of the OUTLIER project is the investigation of algorithms, methods, and processes to alleviate the work of security staff in searching and pursuiting of events in video archives. Furthermore these tasks should be performed more efficiently. Based on the requirements of the Ministry of the Interior as well as the possibilities of an infrastructure operator these issues should be examined and a research prototype should be created. This should occur in cooperation of AIT and ICG (University of Technology Graz) as research partners and ASE as an industrial partner. Essential research subjects are: (i) detection and segmentation of people, (ii) comparisons and finding of events in different video streams, and (iii) analyses and learning of behavior patterns. In addition, a social-scientific acceptance research will be established by the research institute of the Red Cross (FRK). Based on these results recommendations are compiled for the optimization by use and minimization of problem potentials from social-scientific view. |
2009 | 2012 |
|
Narkissos - Virtual Dressing Room
(details) |
The main goal of NARKISSOS is to develop the next generation “magic mirror“ to be installed in a dressing room of a fashion store. The magic mirror is a technical multimedia system, where the consumer can watch himself on a video wall dressed by the clothes which are chosen by touch board or which he did register per RFID tag (embedded in the clothing) at a RFID reader stationed near the video wall of the virtual dressing room. Users can interactively change shape and appearance of the clothing in the mirror image without actually having to change cloths. Customers can also observe themselves (i.e., their avatar) from every side instantaneously. | 2009 | 2012 |
|
OUTLIER
(details) |
The ever increasing number of cameras in surveillance system requires automatic video analysis in order to spot critical situations and to alert the monitoring personnel in a timely manner. While most current approaches in this area aim for detecting a large number of specific events on a large set of complex application scenarios, the goal of this project is to go far beyond state of the art by developing novel online learning methods to detect unusual situations in a camera specific scenario. We will exploit the huge amount of data available for a specific camera to reliably learn usual and unusual situations. In particular the OUTLIER project will carry out basic research in the following areas:
These generic learning algorithms will be applied for the detection of unusual situations in public places and traffic scenarios. Examples are the detection of unusual crowd behavior (upcoming panic, barred emergency exits, or toppled persons), suspicious behavior of pedestrians (e.g. going from one car to another, loitering), vehicles or persons moving on unusual locations, the detection of unusual types of moving objects and detection of unusual situations like accidents, clashes and collisions. Unlike other approaches we do not want to model these situations explicitly and individually, but we will resort to learning to discriminate the usual situation from the unusual one. Research partners in the project are JRS, TUG for basic and applied research and Siemens for industrial exploitation of project results. |
2009 | 2011 |
|
Multimedia Documentation Lab
(details) |
The potential for integration of multimedia content into the analysis of security relevant affairs is researched for the first time within the scope of Austrian security research efforts. The project’s goal is to harvest audio-visual information from specified open multimedia sources such as TV broadcasts and allow for integration into existing environments at user sites. The intended use of the system is to allow experts to efficiently generate more realistic and high-quality situation reports in the face of critical situations. Subsequently, these can be employed for communication with the population of Austria and to increase its security and sense of security - target goals of the KIRAS framework. An exemplary implementation of a prototype will be installed at the Zentraldokumentation of the Austrian Armed Forces. In terms of audio-processing the project builds upon existing technologies of the industrial partner, while the visual processing is investigated by ICG as academic partner and will mainly deal with person/face detection, tracking and recognition methods. |
2009 | 2011 |
|
Person Re-Identification
(details) |
The goal of this project is to develop an interactive visual search method that finds a given pedestrian in a large archive of other camera views efficiently. A user-selected pedestrian image or sequence is used to obtain initial discriminative features and an initial ranked list of hypothetical matches. A discriminative pedestrian recognition model is learned in an on-line manner by user interaction assigning positive and negative labels to the initially retrieved results and on-line boosting for feature selection. This enables that the best discriminative features for the current query are selected. |
2008 | 2010 |
|
Ludwig Boltzmann Institut für Klinisch-Forensische Bildgebung
(details) |
Die klinische Rechtsmedizin gewann in den letzten Jahren aufgrund einer Sensibilisierung der Öffentlichkeit gegenüber häuslicher und sexueller Gewalt, Gewalt gegenüber Kindern und Verdachtsfällen von medizinischen Behandlungsfehlern stark an Bedeutung. Die forensische Untersuchung von Lebenden ist bis heute jedoch auf eine äussere Besichtigung des Körpers beschränkt. Das neue Ludwig-Boltzmann-Institut (LBI) für klinisch-forensische Bildgebung hat zum Ziel, Verfahren zur Erfassung von inneren Verletzungsbefunden als Grundlage für forensische Gutachten zu entwickeln. Mittels Computertomographie (CT) und Magnetresonanztomographie (MRT), welche in der Klinik etabliert sind, können zusätzliche, objektiv nachweisbare innere Verletzungsbefunde erhoben werden, die eine verbesserte Einschätzung der ausgeübten Gewalt gegen die untersuchte Person ermöglichen. Die Methoden sind jedoch auf klinische Diagnostik ausgerichtet, während forensisch wichtige Befunde nicht oder nicht optimal dargestellt werden. Das Institut fuer Maschinelles Sehen und Darstellen kooperiert mit dem LBI zur Entwicklung neuer Methoden der Bildverarbeitung und Computergrafik zum Zwecke der Bildgebung. |
2008 | 2015 |
|
CityFit: High-Quality Urban Reconstructions by Fitting Shape Grammars to Images and derived Textured Point Clouds
(details) |
The generation of realistic 3D models of whole cities has become a vibrant and highly competitive market through the recent activities of, most notably, Goggle Earth and Microsoft Virtual Earth. While the first generation of these systems only delivered high-quality zoomable images of the ground, the current trend is heavily geared towards 3D – that is, users can access three-dimensional height- fields of the terrain, and even 3D models of individual buildings. Simple building models, basically extruded polygons with different types of roofs, can be generated today from aerial images completely automatically. This is a solved problem. Far from solved, however, is the problem of generating automatically detailed buildings with façades. Input data for this problem are registered range maps obtained by stereo matching and sequences of highly overlapping thus redundant images (taken from a car driving in the road) where each pixel has not only a color but also a depth, a z-value. Although range maps can be directly rendered in principle, the data size is huge and, more importantly, the pixels have no semantics: A priori there is no difference between a pixel on the floor, on the wall, or on a door. But these shape semantics are required by all downstream applications using the city model. Shape grammars, on the other hand, have recently become (again) a popular method in research for representing 3D buildings. Their great advantage is that they allow to parameterize buildings, which can be used for populating virtual cities with believable architectural buildings, e.g., for 3D games. The goal of the CITYFIT project is, given highly redundant input imagery and range maps from an arbitrary building in Graz, to synthesize a shape grammar that, when evaluated, creates a clean, CAD- quality reconstruction of that building that fits the original data very closely and makes the semantics of all major architectural features explicit. These shape semantics can even be transferred back to inform the original data, so each of these “semantically enriched” data points can tell whether it belongs to ground, wall, or door. |
2008 | 2010 |
|
vdQA: Video Quality Analysis
(details) |
Automatic and efficient quality analysis of audiovisual content has become a crucial step before storing the material for later use. While most approaches in this area are only dealing with low level signal analysis, the goal of this project is to go far beyond state-of-the-art procedures. On the basis of novel as well as proven computer vision methods, we will attempt to incorporate high level knowledge in the analysis step, thus achieving significant better and faster results than current methods, comparable in their reliability with a human operator. In particular the vdQA project will carry out research in the following areas: • Improvement of optical flow field methodologies to deal with multi-frame information • Application of novel segmentation methods in order to enable semantic quality analysis. • Knowledge assisted artefact assessment and classification. • Novel methods for fast and robust detection of difficult impairments like unsteadiness, flicker, freeze frames, test patterns and lost frames. • Research into methodologies that are particularly well suited for implementations taking advantage of GPU hardware. The grand challenge in the end is the combination of robustness, speed and integration of human knowledge. The research and industrial partners have dedicated roles in the work programme to achieve those goals. The industrial partners have excellent knowledge of the market and will provide user requirements as well as extensive test material. The academic partners will do research in their respective fields, namely development of basic algorithms for optical flow, tracking, segmentation, classification and usage of GPUs as well as algorithms for content based quality analysis and semantic technologies to represent knowledge. Towards the project end the industrial partners will evaluate and test the developments together with pilot end users. Project Partners: Joanneum Research, Institute of Information Systems & Information Management |
2008 | 2010 |
|
Doctoral Program for the Confluence of Graphics and Vision
(details) |
Computer vision and computer graphics constitute two closely related areas of research: Though both fields rely on the same physical and mathematical principles and on a common set of representations, they mainly differ in how these representations are built. Traditionally these two fields have been treated as separate academic discipline. Exploiting the commonalities between vision and graphics turns out to be a scientifically profitable endeavour. There are many examples of fruitfull combination of graphics and vision, but there is no systematic education of students (especially in Austria). Therefore, the goal of this doctoral program Confluence of Vision and Graphics is to educate highly talented PhD students in this interdisciplinary field and to teach them a common view of this challenging topic from the start. All proposed topics require a significant amount of vision and graphics. The students will be co-supervised jointly by one professor with vision and one professor with graphics expertise. The proposed educational program will ensure that the students will be trained to become future leading scientists, which will face the challenges of research excellence in the interdisciplinary area of graphics and vision, academic leadership, and social competence as a member of a particular research group as well as being a part of the global research network. |
2007 | 2019 |
|
ICAO Face Normalization and Analysis
(details) |
The goal of this project is the research and development of state of the art computer vision and object recognition algorithms to analyze face portrait images according to the ICAO (International Civil Aviation Organization) standards and specifications. Therefore a close cooperation with Siemens IT Solutions and Services Biometric Center in Graz exists, where the Biometry group is developing a software solution for this purpose. Current passports issued in the European Union contain biometric data like e.g. digital photographs and fingerprints in order to uniquely identify its owner. To be able to read passports all over the world, the ICAO has specified a number of guidelines and requirements for the structure of these biometric features. In case of face portrait images, examples for these requirements are neutral appearance, eyes opened, mouth closed, frontal pose, straight-looking eyes, properly-sitting eye-glasses, or uncovered faces. Since these analysis steps have to be performed in an automatic fashion, each of these requirements imposes certain computer vision research challenges which are tackled in this research project. Examples for the topics involved in these analysis steps are model-based segmentation using active shape and active appearance models, fast and robust AdaBoost based machine learning algorithms for face and face component detection, or classification of facial expressions using multi-classifier fusion approaches. |
2007 | 2009 |
|
EVis: Autonomous Traffic Monitoring by Embedded Vision
(details) |
The world will witness a tremendous increase in the number of vehicles in the near future. Future traffic monitoring systems will therefore play an important role to improve the throughput and safety of roads. Current monitoring systems capture (usually vision-based) traffic data from a large sensory network; however, they require continuous human supervision which is extremely expensive. In the proposed EVis research project we investigate the scientific and technological foundations for future autonomous traffic monitoring systems. Autonomy is achieved by a novel combination of three approaches: First, vision-based detection and classification methods are augmented by self-learning and scene adaptation mechanisms which will significantly reduce the effort of manual configuration. Second, visual data is fused with data from other sensors such as radar, infrared or inductive loop sensors. Sensor fusion helps to improve the robustness and confidence, to extend the spatial and temporal coverage as well as to reduce the ambiguity and uncertainty of the processed sensor data. Finally, the developed vision and fusion methods are implemented on a distributed embedded platform which makes them wider applicable and supports real-time operation. Our autonomous traffic monitoring system will be evaluated using real world traffic data. The evaluation will be conducted in three different case studies: offline testing using recorded data, online testing on a traffic test site, and on a test installation on a public road. |
2007 | 2010 |
