rss feed Citation View Select Author / Category Hide Abstracts

BibTeX BibTeX citation in raw text format (you can copy-and-paste this into your own BibTeX files).
PDF PDF file, usually containing a draft or technical report corresponding to the cited publications (for copyright reasons, we cannot put up the actually published version--you can get that at your library).
Link A link to the publisher's version of the paper.
Web A web version of the paper--a web of HTML pages containing PNG-rendered pages of the paper.
Slides Slides belonging to a conference talk corresponding to the paper.
Archive Whatever else was saved.

2008

Document Signature Using Intrinsic Features for Counterfeit Detection
Joost van Beusekom, Faisal Shafait, Thomas M. Breuel
Proceedings of the International Workshop on Computational Forensics

BibTeX     PDF     Web    

Abstract  Document security does not only play an important role in specific domains e.g. passports, checks and degrees but also in every day documents e.g. bills and vouchers. Using special high-security features for this class of documents is not feasible due to the cost and the complexity of these methods. We present an approach for detecting falsified docu- ments using a document signature obtained from its intrinsic features: bounding boxes of connected components are used as a signature. Using the model signature learned from a set of original bills, our approach can identify documents whose signature significantly differs from the model signature. Our approach uses globally optimal document alignment to build a model signature that can be used to compute the probability of a new document being an original one. Preliminary evaluation shows that the method is able to reliably detect faked documents.


Automated OCR Ground Truth Generation
Joost van Beusekom, Faisal Shafait, Thomas M. Breuel
Proceedings of DAS 2008 Accepted for publication

BibTeX     PDF     Web    

Abstract  Most optical character recognition (OCR) systems need to be trained and tested on the symbols that are to be recognized. Therefore, ground truth data is needed. This data consists of character images together with their ASCII code. Among the approaches for generating ground truth of real world data, one promising technique is to use electronic version of the scanned documents. Using an alignment method, the character bounding boxes extracted from the electronic document are matched to the scanned image. Current alignment methods are not robust to different similarity transforms. They also need calibration to deal with non-linear local distortions introduced by the printing/scanning process. In this paper we present a significant improvement over existing methods, allowing to skip the calibration step and having a more accurate alignment, under all similarity transforms. Our method finds a robust and pixel accurate scanner independent alignment of the scanned image with the electronic document, allowing the extraction of accurate ground truth character information. The accuracy of the alignment is demonstrated using documents from the UW3 dataset. The results show that the mean distance between the estimated and the ground truth character bounding box position is less than one pixel.


Navidgator - Similarity Based Browsing for Image & Video Databases
Damian Borth, Christian Schulze, Adrian Ulges, Thomas M. Breuel
KI 2008

BibTeX     PDF     Web    

Abstract  A main problem with the handling of multimedia databases is the navigation through and the search within the content of a database. The problem arises from the difference between the possible textual description (annotation) of the database content and its visual appearance. Overcoming the so called - semantic gap - has been in the focus of research for some time. This paper presents a new system for similarity-based browsing of multimedia databases. The system aims at decreasing the semantic gap by using a tree structure, built up on balanced hierarchical clustering. Using this approach, operators are provided with an intuitive and easy-to-use browsing tool. An important objective of this paper is not only on the description of the database organization and retrieval structure, but also how the illustrated techniques might be integrated into a single system. Our main contribution is the direct use of a balanced tree structure for navigating through the database of keyframes, paired with an easy-to-use interface, offering a coarse to fine similarity-based view of the grouped database content.


The OCRopus Open Source OCR System
Thomas M. Breuel
Proceedings IS&T/SPIE 20th Annual Symposium 2008

BibTeX     PDF     Web    

Abstract  OCRopus is a new, open source OCR system emphasizing modularity, easy extensibility, and reuse, aimed at both the research community and large scale commercial document conversions. This paper describes the current status of the system, its general architecture, as well as the major algorithms currently being used for layout analysis and text line recognition.


Binary Morphology and Related Operations on Run-Length Representations
Thomas M. Breuel
Proceedings VISAPP 2008

BibTeX     PDF     Web    

Abstract  Binary morphology on large images is compute intensive, in particular for large structuring elements. Run-length encoding is a compact and space-saving technique for representing images. This paper describes how to implement binary morphology directly on run-length encoded binary images for rectangular structuring elements. In addition, it describes efficient algorithm for transposing and rotating run-length encoded images. The paper evaluates and compares run length morphologial processing on page images from the UW3 database with an efficient and mature bit blit-based implementation and shows that the run length approach is several times faster than bit blit-based implementations for large images and masks. The experiments also show that complexity decreases for larger mask sizes. The paper also demonstrates running times on a simple morphology-based layout analysis algorithm on the UW3 database and shows that replacing bit blit morphology with run length based morphology speeds up performance approximately two-fold.


Segmentation of Curled Text Lines using Active Contours
Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breuel
DAS

BibTeX     PDF     Web    

Abstract  Segmentation of curled textlines from warped document images is one of the major issues in document image de- warping. Most of the curled textlines segmentation algo- rithms present in the literature today are sensitive to the degree of curl, direction of curl, and spacing between adja- cent lines. We present a new algorithm for curled textline segmentation which is robust to above mentioned problems at the expense of high execution time. We will demon- strate this insensitivity in a performance evaluation section. Our approach is based on the state-of-the-art image seg- mentation technique: Active Contour Model (Snake) with the novel idea of several baby snakes and their conver- gence in a vertical direction only. Experiment on publically available CBDAR 2007 document image dewarping contest dataset shows our textline segmentation algorithm accuracy of 97.96%.


Bayes Optimal DDoS Mitigation by Adaptive History-Based IP Filtering
Markus Goldstein, Christoph Lampert, Matthias Reif, Armin Stahl, Thomas M. Breuel
Proceedings of the Seventh International Conference on Networking (icn 2008), pages 174-179

BibTeX     PDF     Web    

Abstract  Distributed Denial of Service (DDoS) attacks are today the most destabilizing factor in the global internet and there is a strong need for sophisticated solutions. We introduce a formal statistical framework and derive a Bayes optimal packet classifier from it. Our proposed practical algorithm "Adaptive History-Based IP Filtering" (AHIF) mitigates DDoS attacks near the victim and outperforms existing methods by at least 32% in terms of collateral damage. Furthermore, it adjusts to the strength of an ongoing attack and ensures availability of the attacked server. In contrast to other adaptive solutions, firewall rulesets used to resist an attack can be precalculated before an attack takes place. This ensures an immediate response in a DDoS emergency. For evaluation, simulated DDoS attacks and two real-world user traffic datasets are used.


Automatic Image Tagging using Community-Driven Online Image Databases
Marius Renn, Joost van Beusekom, Daniel Keysers, Thomas M. Breuel
Proceedings of 6th International Workshop on Adaptive Multimedia Retrieval

BibTeX     PDF     Web    

Abstract  Automatic image tagging is becoming increasingly important to organize large amounts of image data. To identify concepts in images, these tagging systems rely on large sets of annotated image training sets. In this work we analyze image sets taken from online community-driven image databases, such as Flickr, for use in concept identification. Real- world performance is measured using our flexible tagging system, Tagr.


Evaluation of Graylevel-Features for Printing Technique Classification in High-Throughput Document Management Systems
Christian Schulze, Marco Schreyer, Armin Stahl, Thomas Breuel
IWCF 2008

BibTeX     PDF     Web    

Abstract  The detection of altered or forged documents is an important tool in large scale office automation. Printing technique examination can therefore be a valuable source of information to determine a questioned documents authenticity. A study of graylevel features for high throughput printing technique recognition was undertaken. The evaluation included printouts generated by 49 different laser and 13 different inkjet printers. Furthermore, the extracted document features were classified using three different machine learning approaches. We were able to show that, under the given constraints of high-throughput systems, it is possible to determine the printing technique used to create a document.


Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images
Faisal Shafait, Daniel Keysers, Thomas M. Breuel
Document Recognition and Retrieval XV

BibTeX     PDF     Web    

Abstract  Adaptive binarization is an important first step in many doc- ument analysis and OCR processes. This paper describes a fast adaptive binarization algorithm that yields the same quality of binarization as the Sauvola method [1], but runs in time close to that of global thresholding methods (like Otsu's method [2]), independent of the window size. The algorithm combines the statistical constraints of Sauvola's method with integral images [3]. Testing on the UW-1 dataset demonstrates a 20-fold speedup compared to the original Sauvola algorithm.


GREC 2007 Arc Segmentation Contest: Evaluation of Four Participating Algorithms
Faisal Shafait, Daniel Keysers, Thomas M. Breuel
Graphics Recognition: Recent Advances and New Opportunities (GREC 2007 post-proceedings) Accepted for publication

BibTeX     PDF     Web    

Structural Mixtures for Statistical Layout Analysis
Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas M. Breuel
Proc. 8th Int. Workshop on Document Analysis Systems (DAS) Accepted for publication

BibTeX     PDF     Web    

Performance Evaluation and Benchmarking of Six Page Segmentation Algorithms
Faisal Shafait, Daniel Keysers, Thomas M. Breuel
IEEE Transactions on Pattern Analysis and Machine Intelligence 30(6), pages 941--954

BibTeX     PDF     Web    

Abstract  Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify some classes of serious segmentation errors altogether. This paper introduces a vectorial score that is sensitive to, and identifies, the most important classes of segmentation errors (over-, under-, and miss segmentation) and what page components (lines, blocks, etc.) are affected. Unlike previous schemes, our evaluation method has a canonical representation of ground truth data and guarantees pixel-accurate evaluation results for arbitrary region shapes. We present the results of evaluating widely used seg mentation algorithms (x-y cut, smearing, whitespace analysis, constrained text-line finding, docstrum, and Voronoi) on the UW-III database and demonstrate that the new evaluation scheme permits the identification of several specific flaws in individual segmentation methods.


Background Variability Modeling for Statistical Layout Analysis
Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas M. Breuel
Proc. 19th Int. Conf. on Pattern Recognition (ICPR) Accepted for publication

BibTeX     PDF     Web    

Rapid Prototyping of CBR Applications with the Open Source Tool myCBR
Armin Stahl, Thomas Roth-Berghofer
Proceedings of the 9th European Conference on Case-Based Reasoning (ECCBR 2008)

BibTeX     PDF     Web    

Abstract  Although Case-Based Reasoning (CBR) claims to reduce the effort required for developing knowledge-based systems substantially compared with more traditional Artificial Intelligence approaches, the implementation of a CBR application from scratch is still a time consuming task. In this paper we present a novel, freely available tool for rapid prototyping of CBR applications that focuses on the similarity-based retrieval step, like for example case-based product recommender systems. By providing easy to use model generation, data import, similarity modeling, explanation, and testing functionality together with comfortable graphical user interfaces, the tool enables even CBR novices to rapidly create their first CBR applications. Nevertheless, at the same time it ensures enough flexibility to enable expert users to implement advanced CBR applications.


A Local Discriminative Model for Background Subtraction
Adrian Ulges, Thomas M. Breuel
DAGM 2008

BibTeX     PDF     Web    

Abstract  Conventional background subtraction techniques that up- date a background model online have difficulties with correctly segment- ing foreground objects if sudden brightness changes occur. Other meth- ods that learn a global scene model offline suffer from projection errors. To overcome these problems, we present a different approach that is local and discriminative, i.e. for each pixel a classifier is trained to decide whether the pixel belongs to the background or foreground. Such a model requires significantly less tuning effort and shows a better robustness, as we will demonstrate in quantitative experiments on self-created and standard benchmarks. Finally, segmentation is improved by 18 % by integrating the probabilistic evidence provided by the local classifiers with a graph cut segmentation algorithm.


Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors
Adrian Ulges, Christian Schulze, Daniel Keysers, Thomas Breuel
CIVR

BibTeX     PDF     Web    

Abstract  A key problem with the automatic detection of semantic concepts (like `interview' or `soccer') in video streams is the manual acquisition of adequate training sets. Recently, we have proposed to use online videos downloaded from portals like youtube.com for this purpose, whereas tags provided by users during video upload serve as ground truth annotations. The problem with such training data is that it is weakly labeled: Annotations are only provided on video level, and many shots of a video may be "non-relevant", i.e. not visu- ally related to a tag. In this paper, we present a probabilistic framework for learning from such weakly annotated training videos in the presence of irrelevant content. Thereby, the rel- evance of keyframes is modeled as a latent random variable that is estimated during training. In quantitative experiments on real-world online videos and TV news data, we demonstrate that the proposed model leads to a significantly increased robustness with respect to irrelevant content, and to a better generalization of the re- sulting concept detectors.


A System that Learns to Tag Videos by Watching Youtube
Adrian Ulges, Christian Schulze, Daniel Keysers, Thomas Breuel
International Conference on Computer Vision Systems

BibTeX     PDF     Web    

Abstract  We present a system that automatically tags videos, i.e. de- tects high-level semantic concepts like objects or actions in them. To do so, our system does not rely on datasets manually annotated for re- search purposes. Instead, we propose to use videos from online portals like youtube.com as a novel source of training data, whereas tags pro- vided by users during upload serve as ground truth annotations. This allows our system to learn autonomously by automatically downloading its training set. The key contribution of this work is a number of large-scale quantita- tive experiments on real-world online videos, in which we investigate the influence of the individual system components, and how well our tagger generalizes to novel content. Our key results are: (1) Fair tagging results can be obtained by a late fusion of several kinds of visual features. (2) Using more than one keyframe per shot is helpful. (3) To generalize to different video content (e.g., another video portal), the system can be adapted by expanding its training set.


A Branch and Bound Algorithm for Finding the Modes in Kernel Density Estimates
Oliver Wirjadi, Thomas Breuel
Int. J. Computational Intelligence and Applications

BibTeX     PDF     Web    

Abstract  Kernel density estimators are established tools in non-parametric statistics. Due to their flexibility and ease of use, these methods are popular in computer vision and pattern recognition for tasks such as object tracking in video or image segmentation. The most frequently used algorithm for finding the modes in such densities (the mean shift) is a gradient ascent rule, which can converge to local optima. We propose a novel, globally optimal branch and bound algorithm for finding the modes in kernel densities. We show in experiments on datasets up to dimension five that the branch and bound method is faster than local optimization and observe linear scaling of our method with sample size. Quantitative experiments on simulated data show that the new method gives statistically significantly more accurate solutions than the mean shift algorithm. The mode localization accuracy is about 5 times more precise than that of the mean shift for all tested parameters. Applications to color image segmentation on an established benchmark test set also show measurably improved results when using global optimization.


2007

Example-Based Logical Labeling of Document Title Page Images
Joost van Beusekom, Daniel Keysers, Faisal Shafait, Thomas M. Breuel
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), pages 919-923

BibTeX     PDF     Web     Slides    

Abstract  This paper presents a flexible and effective example- based approach for labeling title pages which can be used for automated extraction of bibliographic data. The labels of interest are “Title”, “Author”, “Abstract” and “Affiliation”. The method takes a set of labeled document lay- outs and a single unlabeled document layout as input and finds the best matching layout in the set. The labels of this layout are used to label the new layout. The similarity measure for layouts combines structural layout similarity and textural similarity on the block-level. Experimental results yield accuracy rates from 94.8% to 99.6% obtained on the publicly available MARG dataset. This shows that our lightweight method has equivalent and partially better performance when compared to other more complex labeling methods known from the literature.


Image-Matching for Revision Detection in Printed Historical Documents
Joost van Beusekom, Faisal Shafait, Thomas M. Breuel
DAGM 2007, Pattern Recognition, 29th DAGM Symposium. LNCS Vol. 4713

BibTeX     PDF     Web    

Abstract  In the research area of historical documents it is of high interest to reconstruct the process of the emergence of a historical typesetted document. Therefore, the chronological order of the different versions of a typesetted document has to be reconstructed. This is done by manually finding differences in two versions and then deciding on the order between these two versions. In this paper we present an approach to automate the search for differences in both images. This approach uses a globally optimal image matching technique to overlay both images and colors the differences accordingly. We also present a real-world application for this approach on digitized versions of a historical book.


The hOCR Microformat for OCR Workflow and Results
Thomas M. Breuel
ICDAR (accepted for publication)

BibTeX     PDF     Web    

Abstract  Large scale scanning and document conversion efforts have led to a renewed interest in OCR systems and workflows. This paper describes a new format for representing both intermediate and final OCR results, developed in response to the needs of a newly developed OCR system and ground truth data release. The format is defined as a microformat on top of the HTML and CSS standards and therefore can represent a wide range of linguisitic and typographic phenomena with al- ready well-defined, widely understood markup and can be processed using widely available and known tools. The format is based on a new, multi-level abstraction of OCR results based on logical markup, common typeset- ting models, and OCR engine-specific markup, making it suitable both for the support of existing workflows and the development of future model-based OCR engines.


Testing and Benchmarking Large-Scale Machine Learning Systems
Thomas M. Breuel
Snowbird Learning Workshop

BibTeX     PDF     Web    

Abstract  (extended abstract)


Musical Alignment Using Globally Optimal Short-Time Dynamic Time Warping
Hagen Kaprykowsky, Xavier Rodet
DAGA 2007 (invited)

BibTeX     PDF     Web    

Abstract  Dynamic Time Warping (DTW) aligns two sequences by time warping them optimally. Global optimization is done using whole sequences. This can be very demanding in terms of calculation costs and memory requirements which means the sequence length that is possible to align is limited. In this paper a novel algorithm Short-Time Dynamic Time Warping (STDTW) is presented, which requires much less memory because optimization is done iteratively on smaller portions of the sequences. The particularly remarkable characteristic of the algorithm is that it finds the same globally optimal solution, under some weak hypothesis as the classical DTW algorithm. As an example, STDTW is applied to Musical Alignment which links events in a musical score and points on a audio performance time axis. It also provides an interesting insight into the structure of the sequences to be aligned.


Optimal Geometric Matching for Patch-Based Object Detection
Daniel Keysers, Thomas Deselaers, Thomas M. Breuel
Electronic Letters on Computer Vision and Image Analysis 6(1), pages 44-54

BibTeX     PDF     Web    

Abstract  We present an efficient method to determine the optimal matching of two patch-based image object representations under rotation, scaling, and translation (RST). This use of patches is equivalent to a fully-connected part-based model, for which the presented approach offers an efficient procedure to determine the best fit. While other approaches that use fully connected models have a high complexity in the number of parts used, we achieve linear complexity in that variable, because we only allow RST-matchings. The presented approach is used for object recognition in images: by matching images that contain certain objects to a test image, we can detect whether the test image contains an object of that class or not. We evaluate this approach on the Caltech data and obtain very competitive results.


Improving Accessibility of HTML Documents by Generating Image-Tags in a Proxy
Daniel Keysers, Marius Renn, Thomas M. Breuel
Ninth International ACM SIGACCESS Conference on Computers and Accessibility (In press)

BibTeX     PDF     Web    

Abstract  Many web-pages have reduced accessibility for the visually impaired due to missing alternative textual image tags. We present a novel system that combines a web proxy with an automatic image tagger based on content-based image retrieval technology that alleviates this problem. The obtained results show successful generation of meaningful tags.


Variabilitätsmodellierung für die Bilderkennung
Daniel Keysers
Ausgezeichnete Informatikdissertationen 2006 (to appear)

BibTeX    

Abstract  Dem Computer das Verstehen von Bildern zu ermöglichen ist eine große Herausforderung. Durch die geeignete Behandlung typischer Veränderungen in Bildern kann man die Erkennung von Objekten in vielen Fällen verbessern. In der hier vorgestellten Dissertation werden Modelle zur Beschreibung von Variabilität in Bildern für die erscheinungsbasierte Klassifikation von Objekten untersucht. Die Modelle bestimmen dabei die Ähnlichkeit zwischen zwei gegebenen Bildern, die für die Klassifikation verwendet wird. Als theoretisches Ergebnis wird erstmals gezeigt, dass die Bestimmung der besten flexiblen Abbildung zwischen zwei Bildern für ein zweidimensionales Modell zur Klasse der NP-harten Probleme gehört. In der praktischen Anwendung stellt sich andererseits als entscheidend für niedrige Fehlerraten heraus, dass ein geeigneter Kontext der Bildpixel bei der Abbildung berücksichtigt wird. Durch die Hinzunahme von Kontext werden auch mit weniger komplexen Modellen sehr gute Fehlerraten erreicht. Die Anwendung der vorgestellten Methoden wird vor allem für die Klassifikation handgeschriebener Zeichen und die Kategorisierung von medizinischen Bildern untersucht, wobei in beiden Fällen Ergebnisse erzielt werden, die im Vergleich mit denen anderer Forschergruppen sehr gut abschneiden.


Deformation Models for Image Recognition
Daniel Keysers, Thomas Deselaers, Christian Gollan, Hermann Ney
IEEE Transactions on Pattern Analysis and Machine Intelligence

BibTeX    

Abstract  We present the application of different nonlinear image deformation models to the task of image recognition. The deformation models are especially suited for local changes as they often occur in the presence of image object variability. We show that among the discussed models there is one approach that combines simplicity of implementation, low computational complexity, and highly competitive performance across various real-world image recognition tasks. We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity. In particular, an error rate of 0.54% on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 12.6%, in the 2005 international ImageCLEF evaluation of medical image categorization.


Document Image Zone Classification - A Simple High-Performance Approach
Daniel Keysers, Faisal Shafait, Thomas M. Breuel
VISAPP 2007, pages 44-51

BibTeX     PDF     Web    

Abstract  We describe a simple, fast, and accurate system for document image zone classification -- an important sub-problem of document image analysis -- that results from a detailed analysis of different features. Using a novel combination of known algorithms, we achieve a very competitive error rate of 1.46% (n = 13811) in comparison to (Wang et al., 2006) who report an error rate of 1.55% (n = 24177) using more complicated techniques. The experiments were performed on zones extracted from the widely used UW-III database, which is representative of images of scanned journal pages and contains ground-truthed real-world data.


Gestural Interaction for an Automatic Document Capture System
Christian Kofler, Daniel Keysers, Andres Koetsier, Jasper Laagland, Thomas M. Breuel
CBDAR 2007

BibTeX     PDF     Web    

Abstract  The amount of printed documents used today is still very large despite increased use of digital formats. To bridge the gap between analog paper and digital media, paper documents need to be captured. We present a prototype that allows for cost-effective, fast, and robust document capture using a standard consumer camera. The user's physical desktop is continuously monitored. Whenever a document is detected, the system acquires its content in one of two ways. Either the entire document is captured or a region of interest is extracted, which the user can specify easily by pointing at it. In both modes a high resolution image is taken and the contained information is digitized. The main challenges in designing and implementing such a capturing system are real-time performance, accurate detection of documents, reliable detection of the user's hand and robustness against perturbations such as lighting changes and shadows. This paper presents approaches that address these challenges and discusses the integration into a robust document capture system with gestural interaction.


Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers
Martin Krämer, Hagen Kaprykowsky, Daniel Keysers, Thomas Breuel
ICDAR 2007

BibTeX     PDF     Web    

Abstract  We present the application of probabilistic finite state transducers to the task of bibliographic meta-data extraction from scientific references. By using the transducer approach, which is often applied successfully in computational linguistics, we obtain a trainable and modular framework. This results in simplicity, flexibility, and easy adaptability to changing requirements. An evaluation on the Cora dataset that serves as a common benchmark for accuracy measurements yields a word accuracy of 88.5%, a field accuracy of 82.6%, and an instance accuracy of 42.7%. Based on a comparison to other published results, we conclude that our system performs second best on the given data set using a conceptually simple approach and implementation.


Page Frame Detection for Marginal Noise Removal from Scanned Documents
Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas M. Breuel
SCIA 2007, Image Analysis, Proceedings. LNCS Vol. 4522, pages 651-660

BibTeX     PDF     Web    

Abstract  We describe and evaluate a method to robustly detect the page frame in document images, locating the actual page contents area and removing textual and non-textual noise along the page borders. We use a geometric matching algorithm to find the optimal page frame, which has the advantages of not assuming the existence of whitespace between noisy borders and actual page contents, and of giving a practical solution to the page frame detection problem without the need for parameter tuning. We define suitable performance measures and evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% for each of the performance measures used. In addition, we demonstrate that the use of page frame detection reduces the OCR error rate by removing textual noise. Experiments using a commercial OCR system show that the error rate due to elements outside the page frame is reduced from 4.3% to 1.7% on the UW-III dataset.


Document Image Dewarping Contest
Faisal Shafait, Thomas M. Breuel
2nd Int. Workshop on Camera-Based Document Analysis and Recognition (CBDAR)

BibTeX     PDF     Web    

Abstract  Dewarping of documents captured with hand-held cameras in an uncontrolled environment has triggered a lot of interest in the scientific community over the last few years and many approaches have been proposed. However, there has been no comparative evaluation of different dewarping techniques so far. In an attempt to fill this gap, we have organized a page dewarping contest along with CBDAR 2007. We have created a dataset of 102 documents captured with a hand-held camera and have made it freely available online. We have prepared text-line, text-zone, and ASCII text ground-truth for the documents in this dataset. Three groups participated in the contest with their methods. In this paper we present an overview of the approaches that the participants used, the evaluation measure, and the dataset used in the contest. We report the performance of all participating methods. The evaluation shows that none of the participating methods was statistically significantly better than any other participating method.


Retrieving Relevant Experiences
Armin Stahl
KI Zeitschrift(4), pages 30-33

BibTeX     PDF     Web    

Abstract  In order to enable the efficient reuse of collected experience knowledge, the identification of the most relevant ex- periences with respect to the current reuse need is one of the most crucial issues. One approach for supporting this process with AI methods is the application of similarity-based retrieval techniques developed in the area of Case- Based Reasoning. We describe the basic idea of this approach and present a novel open source tool which simplifies the development of knowledge-based retrieval functionality in experience management applications.


Content-Based Video Tagging for Online Video Portals
Adrian Ulges, Christian Schulze, Daniel Keysers, Thomas M. Breuel
MUSCLE/Image-CLEF Workshop

BibTeX     PDF     Web    

Abstract  Despite the increasing economic impact of the online video market, search in commercial video databases is still mostly based on user-generated meta-data. To complement this manual labeling, recent research efforts have investigated the interpretation of the visual content of a video to automatically annotate it. A key problem with such methods is the costly acquisition of a manually annotated training set. In this paper, we study whether content-based tagging can be learned from user-tagged online video, a vast, public data source. We present an extensive benchmark using a database of real-world videos from the video portal youtube.com. We show that a combination of several visual features improves performance over our baseline system by about 30%.


Motion Interpretation using Adaptive Search of Transformation Space
Adrian Ulges
IUPR Research Group

BibTeX     PDF     Web    

Abstract  This report addresses the extraction of a parametric global motion from a motion field, a task with several applications in video processing. We present two probabilistic formulations of the problem and carry out optimization using the RAST algorithm, a geometric matching method novel to motion estimation in video. RAST uses an exhaustive and adap- tive search of transformation space and thus gives ­ in contrast to local sampling optimization techniques used in the past ­ a globally optimal solution. Among other applications, our framework can thus be used to generate ground truth for benchmarking motion estimation. Our main contributions are: first, the novel combination of a state- of-the-art quality criterion for dominant motion estimation with a search procedure that guarantees global optimality. Second, experimental results that illustrate the superior performance of our approach on synthetic flow fields as well as real-world video streams. Third, a significant speedup of the search achieved by extending a basic model with an additional smoothness prior.


Global Modes in Kernel Density Estimation: RAST Clustering
Oliver Wirjadi, Thomas Breuel
Proc. 7th International Conference on Hybrid Intelligent Systems, pages 314 - 319

BibTeX    

Abstract  The mean shift algorithm is a widely used method for finding local maxima in feature spaces. Mean shift algorithms have been shown in the literature to be equivalent to a gradient ascent optimization of a kernel density estimate. This paper describes a novel, globally optimal optimization method and compares the suboptimal mean shift solutions with the globally optimal solutions derived by the new algorithm. Experimental results on both simulated and real data show that the new algorithm yields solutions that are often significantly better than the suboptimal solutions identified by the mean shift algorithm, and that it scales better to large sample sizes and is more robust to noise levels.


2006

Satellite Tracks Removal in Astronomical Images
Haider Ali, Christoph H. Lampert, Thomas M. Breuel
Progress in Pattern Recognition, Image Analysis and Applications, 11th Iberoamerican Congress on Pattern Recognition, CIARP 2006 accepted for publication

BibTeX     PDF     Web    

Abstract  This paper describes a new system for "Finding Satellite Tracks" in astronomical images based on the modern geometric approach. There is an increasing need of using methods with solid mathematical and statistical foundation in astronomical image processing. Where the computational methods are serving in all disciplines of science, they are becoming popular in the field of astronomy as well. Currently different computational systems are required to be numerically optimized before to get applied on astronomical images. So at present there is no single system which solves the problems of astronomers using computational methods based on modern approaches. The system "Finding Satellite Tracks" is based on geometric matching method "Recognition by Adaptive Subdivision of Transformation Space (RAST)".


Distance Measures for Layout-Based Document Image Retrieval
Joost van Beusekom, Daniel Keysers, Faisal Shafait, Thomas M. Breuel
International Conference on Document Image Analysis for Libraries (DIAL 2006), pages 232-242

BibTeX     PDF     Web    

Abstract  Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.


Round-trip HTML Rendering and Analysis for Testing, Indexing, and Security
Thomas M. Breuel, Daniel Keysers
7th IAPR Workshop on Document Analysis Systems (DAS) Extended abstract

BibTeX     PDF     Web    

Abstract  The widespread adoption of HTML, DHTML, and web technologies has had many benefits, but a number of undesirable uses and problems have emerged as well. Some of these problems are unreliable cross-platform rendering of web pages, attempts to create web pages that deceive either web users or search engines, and lack of accessibility of some web pages by users with vision impairments or users with small screen devices. Standard approaches to addressing these problems rely on syntactic and semantic analysis of the web page source; for example, to determine whether a page is likely to render correctly, a style checker may check for the absence of certain tags or constructs known to cause problems on some browsers. Source based methods are fast, conceptually easy to implement, and can be built using standard parsing and text analysis tools, but they also have significant limitations. For example, the presence of style sheets, JavaScript, and other HTML and plug-in features makes it hard to make statements about the final, rendered form of a web page based on an analysis of its source text. Cross-platform browser problems can only be detected by such methods if the cause of the problem is understood and known, and if appropriate patterns have been formulated that can detect these problems in web page sources; such rules are likely to remain incomplete and their coverage spotty given the evolution of web standards. Similarly, detecting phishing or search engine spam is a co-evolutionary process between adversaries and tool creators­phishers and spammers will develop new attacks in response to each countermeasure. As part of the image based personal computing project in our laboratory, we are developing round-trip rendering and analysis methods for addressing these problems. The foundation of our approach is the observation that the image presented to the end user is ultimately what determines the meaning of a piece of HTML (see also Breuel, 2004, Lopresti, 2005). In this talk, we report on on-going work in our laboratory on developing systems that address cross-platform browser and web page design testing, efforts for fighting phishing and search engine spam, and for improving accessibility.


Arrangements of Planar Curves
Younis Hijazi
In H. Hagen, A. Kerren, P. Dannenmann (Eds.), Visualization of Large and Unstructured Data Sets Proceedings of the first workshop of DFG

BibTeX     PDF     Web    

Abstract  Computing arrangements of curves is a fundamental and challenging problem in computational geometry as leading to many practical applications in a wide range of fields, especially in robot motion planning and computer vision. In this survey paper we present the state of the art for computing the arrangement of planar curves, considering various classes of curves, from lines to arbitrary curves.


Computing Arrangements using Subdivision and Interval Arithmetic
Younis Hijazi, Thomas Breuel
To appear in the Proceedings of the Sixth International Conference on Curves and Surfaces Avignon

BibTeX     PDF     Web    

Abstract  Computing arrangements of curves is a fundamental and challenging problem in computational geometry, with many practical applications in a wide range of fields, including robot motion planning and computer vision. This paper describes a method for computing arrangements of implicitly defined curves. Our method for computing arrangements is an adaptation of methods successfully used for the exploration of large, higher dimensional, non-algebraic arrangements in computer vision. While broadly similar to subdivision methods in computational geometry, its design and philosophy are different; for example, it replaces exact computations by subdivision and interval arithmetic computations and prefers data-independent subdivisions. It can be used (and is usually used in practice) to compute well- defined approximations to arrangements, but can also yield exact answers for specific problem classes.


Color Image Dequantization by Constrained Diffusion
Daniel Keysers, Christoph H. Lampert, Thomas M. Breuel
SPIE Electronic Imaging 2006, pages 6058.03.01-6058.03.10

BibTeX     PDF     Web    

Abstract  We propose a simple and effective method for the dequantization of color images, effectively interpolating the colors from quantized levels to a continuous range of brightness values. The method is designed to be applied to images that either have undergone a manipulation like image brightness adjustment, or are going to be processed in such a way. Such operations often cause noticeable color bands in the images that can be reduced using the proposed Constrained Diffusion technique. We demonstrate the advantages of our method using synthetic and real life images as examples. We also present quantitative results using 8 bit data that has been obtained from original 12 bit sensor data and obtain substantial gains in PSNR using the proposed method.


Optimal Line and Arc Detection on Run-Length Representations
Daniel Keysers, Thomas M. Breuel
Proceedings Graphics Recognition Workshop

BibTeX     PDF     Web    

Abstract  The robust detection of lines and arcs in scanned documents or technical drawings is an important problem in document image understanding. We present a new solution to this problem that works directly on run-length encoded data. The method finds globally optimal solutions to parameterized thick line and arc models. Line thickness is part of the model and directly used during the matching process. Unlike previous approaches, it does not require any thinning or other preprocessing steps, no computation of the line adjacency graphs, and no heuristics. Furthermore, the only search-related parameter that needs to be specified is the desired numerical accuracy of the solution. The method is based on a branch-and-bound approach for the globally optimal detection of these geometric primitives using runs of black pixels in a bi-level image. We present qualitative and quantitative results of the algorithm on images used in the 2003 and 2005 GREC arc segmentation contests.


Modeling of Image Variability for Recognition
Daniel Keysers
PhD thesis, RWTH Aachen University

BibTeX     PDF     Web    

Abstract  This thesis presents the application of different models of image variability to visual recog- nition problems using the paradigm of appearance-based recognition. We first discuss linear models of variability and relate them to the use of Gaussian distributions. This allows us to use well-understood estimation methods to determine the vectors representing the vari- ability. We also relate the discriminative maximum entropy approach to the Gaussian case and use the relationship to derive the novel maximum entropy linear discriminant analysis. Secondly, we investigate discrete deformation models -- that map pixels onto pixels -- of order zero, one, and two, where the order is determined by the constraints imposed on the two-dimensional image distortion. We prove for the first time that the determination of the best match for the second order model belongs to the class of NP-hard problems. We show that it is important to include a suitable context for each pixel to achieve low error rates, which is then possible using the less complex models of lower order. We furthermore discuss the use of local patches for visual object categorization as a model allowing high image variability and show how the use of discriminative training leads to very competitive results. Finally, we describe a model for holistic scene analysis that allows us to determine a visual representation of objects present in a set of images. The methods are primarily applied to the tasks of handwritten character recognition and medical image categorization, yielding excellent results in both cases. In particular, we achieve an error rate of 0.52% on the well-known MNIST benchmark and 12.6% on the IRMA-10,000 database, the lowest within the 2005 ImageCLEF evaluation. We show that the models of image variability also improve the recognition performance of appearance- based sign language and gesture recognition systems. This emphasizes the models' broad applicability.


Comparison and Combination of State-of-the-art Techniques for Handwritten Character Recognition: Topping the MNIST Benchmark
Daniel Keysers
IUPR Research Group, DKFI and TU Kaiserslautern

BibTeX     PDF     Web    

Abstract  Although the recognition of isolated handwritten digits has been a re- search topic for many years, it continues to be of interest for the research community and for commercial applications. We show that despite the maturity of the field, different approaches still deliver results that vary enough to allow improvements by using their combination. We do so by choosing four well-motivated state-of-the-art recognition systems for which results on the standard MNIST benchmark are available. When comparing the errors made, we observe that the errors made differ be- tween all four systems, suggesting the use of classifier combinaiton. We then determine the error rate of a hypothetical system that combines the output of the four systems. The result obtained in this manner is an error rate of 0.35% on the MNIST data, the best result published so far. We furthermore discuss the statistical significance of the combined result and of the results of the individual classifiers.


Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers
Martin Krämer
TU Kaiserslautern & DFKI

BibTeX     PDF     Web    

Abstract  In this paper we present the application of probabilisitic finite state transducers to the problem task of bibliographic meta-data extraction from research paper references. Although finite state techniques have been utilized on various tasks of computational linguistics before they have not been used for the recognition of bibliographic references yet. Especially the involved simplicity and flexibility of modelling as well as the easy adaptability to changing requirements turn out to be beneficial. An evaluation on the Cora dataset that serves as a common benchmark for accuracy measurements and represents quite “hard” cases yields a word accuracy of 88.5%, a field accuracy of 82.6% and an instance accuracy of 42.7%. Therefore our system performs second best on the given testset regarding the published results of similar projects.


Printing Technique Classification for Document Counterfeit Detection
Christoph H. Lampert, Lin Mei, Thomas M. Breuel
Computational Intelligence and Security (CIS) 2006, Ghuangzhou, China

BibTeX     PDF     Web    

Abstract  The detection of counterfeit in printed documents is cur- rently based mainly on built-in security features or on hu- man expertise. We propose a classification system that sup- ports non-expert users to distinguish original documents from PC-made forgeries by analyzing the printing technique used. Each letter in a document is classified using a support vector machine that has been trained to distinguish laser from inkjet printouts. A color coded visualization helps the user to interpret the per-letter classification results.


Anisotropic Gaussian Filtering using Fixed Point Arithmetic
Christoph H. Lampert, Oliver Wirjadi
Proceedings of the 2006 International Conference on Image Processing (ICIP 2006), pages 1565-1568

BibTeX     PDF     Web    

Abstract  Gaussian filtering in one, two or three dimensions is among the most commonly needed tasks in signal and image pro- cessing. Finite impulse response filters in the time domain with Gaussian masks are easy to implement in either float- ing or fixed point arithmetic, because Gaussian kernels are strictly positive and bounded. But these implementations are slow for large images or kernels. With the recursive IIR- filters and FFT-based methods, there are at least two alter- native methods to perform Gaussian filtering in a faster way, but so far they are only applicable when floating-point hard- ware is available. In this paper, a fixed-point implementa- tion of recursive Gaussian filtering is discussed and applied to isotropic and anisotropic image filtering by making use of a non-orthogonal separation scheme of the Gaussian filter.


Machine Learning for Video Compression: Macroblock Mode Decision
Christoph H. Lampert
18th International Conference on Pattern Recognition (ICPR 2006), Hongkong (submitted version)

BibTeX     PDF     Web    

Abstract  Video Compression currently is dominated by engineer- ing and fine-tuned heuristic methods. In this paper, we pro- pose to instead apply the well-developed machinery of ma- chine learning in order to support the optimization of ex- isting video encoders and the creation of new ones. Exem- plarily, we show how by machine learning we can improve one encoding step that is crucial for the performance of all current video standards: macroblock mode decision. By formulating the problem in a Bayesian setup, we show that macroblock mode decision can be reduced to a classi- fication problem with a cost function for misclassification that is sample dependent. We demonstrate how to apply dif- ferent machine learning techniques to obtain suitable clas- sifiers and we show in detailed experiments that all of these perform better than the state-of-the-art heuristic method.


Objective Quality Measurement for Geometric Document Image Restoration (extended abstract)
Christoph H. Lampert, Thomas M. Breuel
7th IAPR Workshop on Document Analysis Systems (DAS)

BibTeX     PDF     Web    

Abstract  Many algorithms to remove distortion from document images have be proposed in recent years, but so far there is no reliable method for comparing their perfor- mance. In this paper we propose a collection of methods to measure the quality of such restoration algorithms for document image which show a non-linear dis- tortion due to perspective or page curl. For the result from these measurement to be meaningful, a common data set of ground truth is required. We therefore started with the buildup of a document image database that is meant to serve as a common data basis for all kinds of restoration from images of 3D-shaped document. The long term goal would be to establish this database and following extensions in the area of document image dewarping as an as fruitful and indispensable tool as e.g. the NIST database is for OCR, or the Caltech database is for object and face recognition.


Performance Comparison of Six Algorithms for Page Segmentation
Faisal Shafait, Daniel Keysers, Thomas M. Breuel
Proc. Document Analysis Systems (DAS), LNCS Vol. 3872 (An extended version of this paper is published in the June 2008 issue of IEEE TPAMI), pages 368-379

BibTeX     PDF     Web    

Abstract  This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default parameters and optimized parameters. In the course of the evaluation, the strengths and weaknesses of each algorithm are analyzed, and it is shown that no single algorithm outperforms all other algorithms. However, we observe that the three best-performing algorithms are those based on constrained text-line finding, Docstrum, and the Voronoi-diagram.


Layout Analysis of Urdu Document Images
Faisal Shafait, Adnan-ul-Hasan, Daniel Keysers, Thomas M. Breuel
10th IEEE International Multi-topic Conference (INMIC 2006), Islamabad, Pakistan.

BibTeX     PDF     Web    

Abstract  Layout analysis is a key component of an OCR system. In this paper, we present a layout analysis system for extracting text-lines in reading order from Urdu document images. For this purpose, we evaluate an existing system for Roman script text on Urdu documents and describe its methods and the main changes necessary to adapt it to Urdu script. The main changes are: 1) the text-line model for Roman script is modified to adapt to Urdu text, 2) reading order of an Urdu document is defined. The method is applied to a collection of scanned Urdu documents from various books, magazines, and newspapers. The results show high text-line detection accuracy on scanned images of Urdu prose and poetry books and magazines. The algorithm also works reasonably well on newspaper images. We also identify directions for future work which may further improve the accuracy of the system.


Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images
Faisal Shafait, Daniel Keysers, Thomas M. Breuel
ICPR 2006, International Conference on Pattern Recognition An extended version of this paper is published in the June 2008 issue of IEEE TPAMI, pages 872-875

BibTeX     PDF     Web    

Abstract  This paper presents a new representation and evaluation procedure of page segmentation algorithms and analyzes six widely-used layout analysis algorithms using the proce dure. The method permits a detailed analysis of the behavior of page segmentation algorithms in terms of over- and undersegmentation at different layout levels, as well as determination of the geometric accuracy of the segmentation. The representation of document layouts relies on labeling each pixel according to its function in the overall segmentation, permitting pixel-accurate representation of layout information of arbitrary layouts and allowing background pixels to be classified as "don't care". Our representations can be encoded easily in standard color image formats like PNG, permitting easy interchange of segmentation results and ground truth.


Real Time Lip Motion Analysis for a Person Authentication System Using Near Infrared Illumination
Faisal Shafait, Ralph Kricke, Islam Shdaifat, Rolf-Rainer Grigat
13th Int. Conf. on Image Processing, ICIP’06, Atlanta, GA, USA, pages 1957-1960

BibTeX     PDF     Web    

Abstract  In this paper we present an approach for lip motion analysis that can be used in conjunction with a person authentication system based on face recognition, to avoid attacks on the system using passive photographs. This work focuses on robustly tracking lips in gray scale images, which may be captured in the visible light or near infrared spectrum. We present an approach for locating the two lip corners in a face image. Then we extract suitable features from the mouth region to classify mouth states (visemes). The system shows a classification accuracy of above 85%. The temporal changes in the detected viseme classes can be used for detecting the impostor.


Optimizing Similarity Assessment in Case-Based Reasoning
Armin Stahl, Thomas Gabel
Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06)

BibTeX     PDF     Web    

Abstract  The definition of accurate similarity measures is a key issue of every Case-Based Reasoning application. Although some approaches to optimize similarity measures automatically have already been applied, these approaches are not suited for all CBR application domains. On the one hand, they are restricted to classification tasks. On the other hand, they only allow optimization of feature weights. We propose a novel learning approach which addresses both problems, i.e. it is suited for most CBR application domains beyond simple classification and it enables learning of more sophisticated similarity measures.


Combining Case-Based and Similarity-Based Product Recommendation
Armin Stahl
Proceedings of the 8th European Conference on Case-Based Reasoning (ECCBR 2006)

BibTeX     PDF     Web    

Abstract  Product recommender systems are a popular application and research field of CBR for several years now. However, almost all CBR-based recommender systems are not case-based in the original view of CBR, but just perform a similarity-based retrieval of product descriptions. Here, a predefined similarity measure is used as a heuristics for estimating the customers' product preferences. In this paper we propose an extension of these systems, which enables case-based learning of customer preferences and which also allows to incorporate collaborative recommendation techniques. Further, we show how this approach can be combined with existing approaches for learning the similarity measure directly. The presented results of a first experimental evaluation demonstrate the feasibility of our novel approach in an exemplary test domain.


Spatiogram-Based Shot Distances for Video Retrieval
Adrian Ulges, Christoph H. Lampert, Daniel Keysers
Trecvid 2006 Workshop

BibTeX     PDF     Web    

Abstract  We propose a video retrieval framework based on a novel combination of spatiograms and the Jensen-Shannon divergence, and validate its performance in two quantitative experiments on TRECVID BBC Rushes data. In the first experiment, color-based methods are tested by group- ing redundant shots in an unsupervised clustering. Results of the second experiment show that motion-based spatiograms make a promising fast, compressed-domain descriptor for the detection of interview scenes.


Recognizing Objects in Still Images and Video Streams
Adrian Ulges
IUPR Research Group

BibTeX     PDF     Web    

Abstract  Abstract This paper addresses the problem of recognizing objects in visual me- dia. Though the field has come a long way, this task is far from being solved for generic objects in arbitrary scenes. Nevertheless, recent devel- opments have made object recognition more successful and exible, with its most promising applications in multimedia indexing and retrieval. The main purpose of this paper is to give a survey of object recognition in both still images and video. Also, a self-built prototype is described for the recognition of items presented to a camera. In experiments, a global, histogram-based method and a local, patch-based approach were compared, with the latter showing a higher robustness to scene changes.


Application of Case-Based Reasoning to predict Sludge Settling process and Endogenous Denitrification
Jürgen Wiese, Heidrun Steinmetz, Armin Stahl
Proceedings of the 5th IWA World Water Congress

BibTeX     PDF     Web    

Abstract  For the last years, artificial intelligence (AI) approaches have become useful tools in environmental engineering. Here, one relevant application area is the optimization of waste-water treatment plants (WWTP). In this paper we present a tool for real-time control (RTC) and decision support, which has been tailored to sequencing batch reactors (SBR) plants. The tool, which is able to predict the sludge settling curves as well as the endogenous denitrification (ED) during settle and draw, is based on case-based reasoning (CBR), an AI method. The tool bases its decision on past events and situations captured in cases.


Automated Feature Selection for the Classification of Meningioma Cell Nuclei
Oliver Wirjadi, Thomas M. Breuel, Wolfgang Feiden, Yoo-Jin Kim
Bildverarbeitung für die Medizin, pages 76-80

BibTeX     PDF     Web    

Abstract  A supervised learning method for image classification is presented which is independent of the type of images that will be processed. This is realized by constructing a large base of grey-value and colour based image features. We then rely on a decision tree to choose the features that are most relevant for a given application. We apply and evaluate our system on the classification task of meningioma cells.


Reduced Complexity Techniques for Long-Term Memory Motion Compensated Prediction in Hybrid Video Coding
Waqar Zia, Faisal Shafait
25th Picture Coding Symposium, PCS’06

BibTeX     PDF     Web    

Abstract  Long-term memory motion compensated prediction and up to ¼-pel accurate motion compensation contribute a considerable portion of the compression gain provided by H.264/AVC over its predecessors. This paper investigates the factors contributing to the spectral distortions introduced in the digitized video signal. A quantitative analysis shows that fractional-pel interpolation is the main source of these spectral distortions. Using these results, two techniques are proposed for reducing computational complexity with negligible effects on the quality of the video. Simulation of the proposed techniques show up to 56% complexity reduction compared to the reference scheme without any significant decrease in signal-to-noise ratio.


2005

Approximate vs. Representative Nearest Neighbors
Thomas M. Breuel
Snowbird Learning Workshop

BibTeX     PDF     Web    

Abstract  Over the last several years, there has been renewed interest in efficient nearest neighbor search algorithms. Such algorithms have uses in areas like information retrieval, pattern recognition, data mining, compression, and databases. Known algorithms for the exact nearest neighbor problem have such complexities that, in practice, high dimensional nearest neighbor problems are usually solved with brute force search, that is, computing the distance of the query point with each data point in the database. In order to achieve better performance than brute force search, ...


The Future of Document Imaging in the Era of Electronic Documents
Thomas M. Breuel
International Workshop on Document Analysis

BibTeX     PDF     Web    

Abstract  Document imaging and document analysis are technologies for the interpretation and manipulation of document images. It is commonly assumed that the increased use of electronic documents and data communications will obviate the need for document imaging and document analysis, as more and more documents are ex- changed in formats like HTML, XML, PDF, and other well-defined, structured formats. This paper examines the question of how likely it is that paper will be replaced by electronic documents in the near future, what possibilities exist for paper and electronic documents to co-exist, and what role document imaging and document analysis will play as electronic communications and computers become ever more widespread and portable.


Optimal Line and Arc Detection on Run-Length Representations
Daniel Keysers, Thomas M. Breuel
GREC 2005 - Sixth IAPR International Workshop on Graphics Recognition, pages 17-23

BibTeX     PDF     Web    

Abstract  The robust detection of lines and arcs in scanned documents or technical drawings is an important problem in document image understanding. We present a new solution to this problem that works directly on run-length encoded data. The method finds globally optimal solutions to parameterized thick line and arc models. Line thickness is part of the model and used during the matching process. Unlike previous approaches, it does not require any thinning or other preprocessing steps, no computation of the line adjacency graphs, and no heuristics. Furthermore, the only search-related parameter that needs to be specified is the desired numerical accuracy of the solution. The method is based on a branch-and-bound approach for the globally optimal detection of these geometric primitives using runs of black pixels in a bi-level image. We present qualitative results of the algorithm on images used in the 2003 GREC arc segmentation contest.


Oblivious Document Capture and Real-Time Retrieval
Christoph H. Lampert, Tim Braun, Adrian Ulges, Daniel Keysers, Thomas M. Breuel
International Workshop on Camera Based Document Analysis and Recognition (CBDAR), pages 79-86

BibTeX     PDF     Web    

Abstract  Ever since text processors became popular, users have dreamt of handling documents printed on paper as comfortably as electronic ones, with full text search typically appearing very close to the top of the wish list. This paper presents the design of a prototype system that takes a step into this direction. The user's desktop is continuously monitored and of each detected document a high resolution snapshot is taken using a digital camera. The resulting image is processed using specially designed dewarping and OCR algorithms, making a digital and fully searchable version of the document available to the user in real-time. These steps are performed without any user interaction. This enables the system to run as a background task without disturbing the user in his or her work, while at the same time offering electronic access to all paper documents that have been present on the desktop during the uptime of the system.


Learning Similarity Measures: A Formal View Based on a Generalized CBR Model
Armin Stahl
Proceedings of the 6th International Conference on Case-Based Reasoning (ICCBR), pages 507-521

BibTeX     PDF     Web    

Abstract  Although similarity measures play a crucial role in CBR applications, clear methodologies for defining them have not been developed yet. One approach to simplify the definition of similarity measures involves the use of machine learning techniques. In this paper we investigate important aspects of these approaches in order to support a more goal-directed choice and application of existing approaches and to initiate the development of new techniques. This investigation is based on a novel formal generalization of the classic CBR cycle, which allows a more suitable analysis of the requirements, goals, assumptions and restrictions that are relevant for learning similarity measures.


Document Image Dewarping using Robust Estimation of Curled Text Lines
Adrian Ulges, Christoph H. Lampert, Thomas M. Breuel
International Conference on Document Analysis and Recognition (ICDAR), pages 1001-1005

BibTeX     PDF     Web    

Abstract  Digital cameras have become almost ubiquitous, and their use for fast and casual capturing of natural images is unchallenged. For making images of documents, however, they have not caught up to flatbed scanners yet, mainly because camera images tend to suffer from distortion due to the perspective and are therefore limited in their further use for archival or OCR. For images of non-planar paper surfaces like books, page curl causes additional distortion, which poses an even greater problem due to its nonlinearity. This paper presents a new algorithm for removing both perspective and page curl distortion. It requires only a single camera image as input and relies on a priori layout information instead of additional hardware. Therefore, it is much more user friendly than most previous approaches, and allows for flexible ad hoc document capture. Results are presented showing that the algorithm produces visually pleasing output and increases OCR accuracy, thus having the potential to become a general purpose preprocessing tool for camera based document capture.


Applying and optimizing case-based reasoning for wastewater treatment systems
Jürgen Wiese, Armin Stahl, Joachim Hansen
AI Communications. Special Issue: Binding Environmental Sciences and AI 18(4), pages 269-279

BibTeX     PDF     Web    

Abstract  For the last years, artificial intelligence (AI) approaches have become useful tools in environmental engineering. Here, one relevant application area is the optimization of wastewater treatment plants (WWTP). In this paper, we present several examples for real-time Control (RTC) tasks and decision support systems (DSS) for wastewater treatment (WWT), specifically based on case-based reasoning (CBR). Moreover, we present an approach for optimizing the prediction accuracy of these systems. The idea of this approach is to employ knowledge-intensive similarity measures instead of simple distance metrics. In order to facilitate the modeling of these measures resulting in lower deployment costs of the CBR systems, we propose a novel machine learning technique.


Approximate Separable 3D Anisotropic Gauss Filter
Oliver Wirjadi, Thomas M. Breuel
Proc. IEEE International Conference on Image Processing (ICIP 2005), pages 149-152

BibTeX     PDF     Web    

Abstract  Anisotropic Gaussian filters are useful for adaptive smoothing and feature extraction. In our application, micro - tomographic images of fibers were smoothed by anisotropic Gaussians. In this case, this is more natural than using their isotropic counterparts. But filtering in large 3D data is very time consuming. We extend the work of Geusebroek et al. on fast Gauss filtering to three dimensions. We propose an approximate separable filtering scheme which consists of three 1D convolutions. Initial experiments suggest that this filter can outperform an FFT based implementation when the kernel size is small compared to the size of the 3D images.


2004

Exploiting Background Knowledge when Learning Similarity Measures
Thomas Gabel, Armin Stahl
Proceedings of the 6th International Conference on Case-Based Reasoning

BibTeX     PDF     Web    

Abstract  The definition of similarity measures—one core component of every CBR application—leads to a serious knowledge acquisition problem if domain and application specific requirements have to be considered. To reduce the knowledge acquisition effort, different machine learning techniques have been developed in the past. In this paper, enhancements of our framework for learning knowledge-intensive similarity measures are presented. The described techniques aim to restrict the search space to be considered by the learning algorithm by exploiting available background knowledge. This helps to avoid typical problems of machine learning, such as overfitting the training data.


Document Capture using Stereo Vision
Adrian Ulges, Christoph H. Lampert, Thomas M. Breuel
ACM Symposium on Document Engineering, pages 198-200

BibTeX     PDF     Web    

Abstract  Capturing images of documents using handheld digital cameras has a variety of applications in academia, research, knowledge management, retail, and office settings. The ultimate goal of such systems is to achieve image quality comparable to that currently achieved with flatbed scanners even for curved, warped, or curled pages. This can be achieved by high-accuracy 3D modeling of the page surface, followed by a flattening of the surface. A number of previous systems have either assumed only perspective distortions, or used techniques like structured lighting, shading, or sideimaging for obtaining 3D shape. This paper describes a system for handheld camera-based document capture using general purpose stereo vision methods followed by a new document dewarping technique. Examples of shape modeling and dewarping of book images is shown.


2003

On the Use of Interval Arithmetic in Geometric Branch-and-Bound Algorithms
Thomas M. Breuel
Accepted for publication in Pattern Recognition Letters

BibTeX     PDF     Web    

Abstract  Branch and bound methods have become established methods for geometric matching over the last decade. This paper presents techniques that improve on previous branch and bound methods in two important ways: they guarantee reliable solutions even in the presence of numerical roundoff error, and they eliminate the need to derive bounding functions man- ually. These new techniques are compared experimentally with recognition-by-alignment and previous branch and bound techniques on geometric matching problems. Novel meth- ods for non-linear baseline finding and globally optimal robust linear regression using these techniques are described.