Skip to content

Latest commit

 

History

History
57 lines (57 loc) · 38 KB

Wed, 19 Sep 2018.md

File metadata and controls

57 lines (57 loc) · 38 KB

ArXiv Paper Abstract--Wed, 19 Sep 2018

1.MNIST Dataset Classification Utilizing k-NN Classifier with Modified Sliding Window Metric pdf

This paper evaluates the performance of the K-nearest neighbor classification algorithm on the MNIST dataset of the handwritten digits. The L2 Euclidean distance metric is compared to a modified distance metric which utilizes the sliding window technique in order to avoid performance degradations due to slight spatial misalignments. Accuracy and confusion matrix are used as the performance indicators to compare the performance of the baseline algorithm versus the enhanced sliding window method and results show significant improvement using this simple method.

2.Albumentations: fast and flexible image augmentations pdf

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at this https URL

3.Generalized Content-Preserving Warps for Image Stitching pdf

Local misalignment caused by global homography is a common issue in image stitching task. Content-Preserving Warping (CPW) is a typical method to deal with this issue, in which geometric and photometric constraints are imposed to guide the warping process. One of its essential condition however, is colour consistency, and an elusive goal in real world applications. In this paper, we propose a Generalized Content-Preserving Warping (GCPW) method to alleviate this problem. GCPW extends the original CPW by applying a colour model that expresses the colour transformation between images locally, thus meeting the photometric constraint requirements for effective image stitching. We combine the photometric and geometric constraints and jointly estimate the colour transformation and the warped mesh vertexes, simultaneously. We align images locally with an optimal grid mesh generated by our GCPW method. Experiments on both synthetic and real images demonstrate that our new method is robust to colour variations, outperforming other state-of-the-art CPW-based image stitching methods.

4.3D segmentation of mandible from multisectional CT scans by convolutional neural networks pdf

Segmentation of mandibles in CT scans during virtual surgical planning is crucial for 3D surgical planning in order to obtain a detailed surface representation of the patients bone. Automatic segmentation of mandibles in CT scans is a challenging task due to large variation in their shape and size between individuals. In order to address this challenge we propose a convolutional neural network approach for mandible segmentation in CT scans by considering the continuum of anatomical structures through different planes. The proposed convolutional neural network adopts the architecture of the U-Net and then combines the resulting 2D segmentations from three different planes into a 3D segmentation. We implement such a segmentation approach on 11 neck CT scans and then evaluate the performance. We achieve an average dice coefficient of $ 0.89 $ on two testing mandible segmentation. Experimental results show that our proposed approach for mandible segmentation in CT scans exhibits high accuracy.

5.Multiple Combined Constraints for Image Stitching pdf

Several approaches to image stitching use different constraints to estimate the motion model between image pairs. These constraints can be roughly divided into two categories: geometric constraints and photometric constraints. In this paper, geometric and photometric constraints are combined to improve the alignment quality, which is based on the observation that these two kinds of constraints are complementary. On the one hand, geometric constraints (e.g., point and line correspondences) are usually spatially biased and are insufficient in some extreme scenes, while photometric constraints are always evenly and densely distributed. On the other hand, photometric constraints are sensitive to displacements and are not suitable for images with large parallaxes, while geometric constraints are usually imposed by feature matching and are more robust to handle parallaxes. The proposed method therefore combines them together in an efficient mesh-based image warping framework. It achieves better alignment quality than methods only with geometric constraints, and can handle larger parallax than photometric-constraint-based method. Experimental results on various images illustrate that the proposed method outperforms representative state-of-the-art image stitching methods reported in the literature.

6.Capsule Deep Neural Network for Recognition of Historical Graffiti Handwriting pdf

Automatic recognition of the historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia cathedral in Kyiv (Ukraine) was demonstrated by means of capsule deep learning neural network. It was applied to the image dataset of the carved Glagolitic and Cyrillic letters (CGCL), which was assembled and pre-processed recently for recognition and prediction by machine learning methods (this https URL). CGCL dataset contains >4000 images for glyphs of 34 letters which are hardly recognized by experts even in contrast to notMNIST dataset with the better images of 10 letters taken from different fonts. Despite the much worse quality of CGCL dataset and extremely low number of samples (in comparison to notMNIST dataset) the capsule network model demonstrated much better results than the previously used convolutional neural network (CNN). The validation accuracy (and validation loss) was higher (lower) for capsule network model than for CNN without data augmentation even. The area under curve (AUC) values for receiver operating characteristic (ROC) were also higher for the capsule network model than for CNN model: 0.88-0.93 (capsule network) and 0.50 (CNN) without data augmentation, 0.91-0.95 (capsule network) and 0.51 (CNN) with lossless data augmentation, and similar results of 0.91-0.93 (capsule network) and 0.9 (CNN) in the regime of lossless data augmentation only. The confusion matrixes were much better for capsule network than for CNN model and gave the much lower type I (false positive) and type II (false negative) values in all three regimes of data augmentation. These results supports the previous claims that capsule-like networks allow to reduce error rates not only on MNIST digit dataset, but on the other notMNIST letter dataset and the more complex CGCL handwriting graffiti letter dataset also.

7.SECS: Efficient Deep Stream Processing via Class Skew Dichotomy pdf

Despite that accelerating convolutional neural network (CNN) receives an increasing research focus, the save on resource consumption always comes with a decrease in accuracy. To both increase accuracy and decrease resource consumption, we explore an environment information, called class skew, which is easily available and exists widely in daily life. Since the class skew may switch as time goes, we bring up probability layer to utilize class skew without any overhead during the runtime. Further, we observe class skew dichotomy that some class skew may appear frequently in the future, called hot class skew, and others will never appear again or appear seldom, called cold class skew. Inspired by techniques from source code optimization, two modes, i.e., interpretation and compilation, are proposed. The interpretation mode pursues efficient adaption during runtime for cold class skew and the compilation mode aggressively optimize on hot ones for more efficient deployment in the future. Aggressive optimization is processed by class-specific pruning and provides extra benefit. Finally, we design a systematic framework, SECS, to dynamically detect class skew, processing interpretation and compilation, as well as select the most accurate architectures under the runtime resource budget. Extensive evaluations show that SECS can realize end-to-end classification speedups by a factor of 3x to 11x relative to state-of-the-art convolutional neural networks, at a higher accuracy.

8.Compressed Sensing Parallel MRI with Adaptive Shrinkage TV Regularization pdf

Compressed sensing (CS) methods in magnetic resonance imaging (MRI) offer rapid acquisition and improved image quality but require iterative reconstruction schemes with regularization to enforce sparsity. Regardless of the difficulty in obtaining a fast numerical solution, the total variation (TV) regularization is a preferred choice due to its edge-preserving and structure recovery capabilities. While many approaches have been proposed to overcome the non-differentiability of the TV cost term, an iterative shrinkage based formulation allows recovering an image through recursive application of linear filtering and soft thresholding. However, providing an optimal setting for the regularization parameter is critical due to its direct impact on the rate of convergence as well as steady state error. In this paper, a regularizer adaptively varying in the derivative space is proposed, that follows the generalized discrepancy principle (GDP). The implementation proceeds by adaptively reducing the discrepancy level expressed as the absolute difference between TV norms of the consistency error and the sparse approximation error. A criterion based on the absolute difference between TV norms of consistency and sparse approximation errors is used to update the threshold. Application of the adaptive shrinkage TV regularizer to CS recovery of parallel MRI (pMRI) and temporal gradient adaptation in dynamic MRI are shown to result in improved image quality with accelerated convergence. In addition, the adaptive TV-based iterative shrinkage (ATVIS) provides a significant speed advantage over the fast iterative shrinkage-thresholding algorithm (FISTA).

9.A Simple Approach to Intrinsic Correspondence Learning on Unstructured 3D Meshes pdf

The question of representation of 3D geometry is of vital importance when it comes to leveraging the recent advances in the field of machine learning for geometry processing tasks. For common unstructured surface meshes state-of-the-art methods rely on patch-based or mapping-based techniques that introduce resampling operations in order to encode neighborhood information in a structured and regular manner. We investigate whether such resampling can be avoided, and propose a simple and direct encoding approach. It does not only increase processing efficiency due to its simplicity - its direct nature also avoids any loss in data fidelity. To evaluate the proposed method, we perform a number of experiments in the challenging domain of intrinsic, non-rigid shape correspondence estimation. In comparisons to current methods we observe that our approach is able to achieve highly competitive results.

10.Support Vector Machine (SVM) Recognition Approach adapted to Individual and Touching Moths Counting in Trap Images pdf

This paper aims at developing an automatic algorithm for moth recognition from trap images in real-world conditions. This method uses our previous work for detection [1] and introduces an adapted classification step. More precisely, SVM classifier is trained with a multi-scale descriptor, Histogram Of Curviness Saliency (HCS). This descriptor is robust to illumination changes and is able to detect and to describe the external and the internal contours of the target insect in multi-scale. The proposed classification method can be trained with a small set of images. Quantitative evaluations show that the proposed method is able to classify insects with higher accuracy (rate of 95.8%) than the state-of-the art approaches.

11.Attribute Enhanced Face Aging with Wavelet-based Generative Adversarial Networks pdf

Since it is difficult to collect face images of the same subject over a long range of age span, most existing face aging methods resort to unpaired datasets to learn age mappings for rendering aging effects on a given young face image. However, the matching ambiguity between young and aged face images inherent to unpaired training data may lead to inconsistent facial attributes in generated images, which could not be solved by only enforcing identity consistency. In this paper, we propose a novel attribute enhanced face aging model with wavelet-based Generative Adversarial Networks (GANs) to address the above issues.To be specific, we embed facial attribute vectors into both generator and discriminator of the model to encourage each synthesized elderly face image to be faithful to the attribute of its corresponding input.In addition, a wavelet packet transform (WPT) module is incorporated to improve the visual fidelity of generated images by capturing age-related texture details at multiple scales in the frequency space. Qualitative results demonstrate the ability of our model to synthesize visually plausible face images, and extensive quantitative evaluation results show that the proposed method achieves state-of-the-art performance on existing databases.

12.Enhanced 3DTV Regularization and Its Applications on Hyper-spectral Image Denoising and Compressed Sensing pdf

The 3-D total variation (3DTV) is a powerful regularization term, which encodes the local smoothness prior structure underlying a hyper-spectral image (HSI), for general HSI processing tasks. This term is calculated by assuming identical and independent sparsity structures on all bands of gradient maps calculated along spatial and spectral HSI modes. This, however, is always largely deviated from the real cases, where the gradient maps are generally with different while correlated sparsity structures across all their bands. Such deviation tends to hamper the performance of the related method by adopting such prior term. To this is- sue, this paper proposes an enhanced 3DTV (E-3DTV) regularization term beyond conventional 3DTV. Instead of imposing sparsity on gradient maps themselves, the new term calculated sparsity on the subspace bases on the gradient maps along their bands, which naturally encode the correlation and difference across these bands, and more faithfully reflect the insightful configurations of an HSI. The E-3DTV term can easily replace the previous 3DTV term and be em- bedded into an HSI processing model to ameliorate its performance. The superiority of the proposed methods is substantiated by extensive experiments on two typical related tasks: HSI denoising and compressed sensing, as compared with state-of-the-arts designed for both tasks.

13.Symbolic Tensor Neural Networks for Digital Media - from Tensor Processing via BNF Graph Rules to CREAMS Applications pdf

This tutorial material on Convolutional Neural Networks (CNN) and its applications in digital media research is based on the concept of Symbolic Tensor Neural Networks. The set of STNN expressions is specified in Backus-Naur Form (BNF) which is annotated by constraints typical for labeled acyclic directed graphs (DAG). The BNF induction begins from a collection of neural unit symbols with extra (up to five) decoration fields (including tensor depth and sharing fields). The inductive rules provide not only the general graph structure but also the specific shortcuts for residual blocks of units. A syntactic mechanism for network fragments modularization is introduced via user defined units and their instances. Moreover, the dual BNF rules are specified in order to generate the Dual Symbolic Tensor Neural Network (DSTNN). The joined interpretation of STNN and DSTNN provides the correct flow of gradient tensors, back propagated at the training stage. The proposed symbolic representation of CNNs is illustrated for six generic digital media applications (CREAMS): Compression, Recognition, Embedding, Annotation, 3D Modeling for human-computer interfacing, and data Security based on digital media objects. In order to make the CNN description and its gradient flow complete, for all presented applications, the symbolic representations of mathematically defined loss/gain functions and gradient flow equations for all used core units, are given. The tutorial is to convince the reader that STNN is not only a convenient symbolic notation for public presentations of CNN based solutions for CREAMS problems but also that it is a design blueprint with a potential for automatic generation of application source code.

14.U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification pdf

Periodical inspection and maintenance of critical infrastructure such as dams, penstocks, and locks are of significant importance to prevent catastrophic failures. Conventional manual inspection methods require inspectors to climb along a penstock to spot corrosion, rust and crack formation which is unsafe, labor-intensive, and requires intensive training. This work presents an alternative approach using a Micro Aerial Vehicle (MAV) that autonomously flies to collect imagery which is then fed into a pretrained deep-learning model to identify corrosion. Our simplified U-Net trained with less than 40 image samples can do inference at 12 fps on a single GPU. We analyze different loss functions to solve the class imbalance problem, followed by a discussion on choosing proper metrics and weights for object classes. Results obtained with the dataset collected from Center Hill Dam, TN show that focal loss function, combined with a proper set of class weights yield better segmentation results than the base loss, Softmax cross entropy. Our method can be used in combination with planning algorithm to offer a complete, safe and cost-efficient solution to autonomous infrastructure inspection.

15.Image Super-Resolution via Deterministic-Stochastic Synthesis and Local Statistical Rectification pdf

Single image superresolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this paper, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the groundtruth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin.

16.Deep Textured 3D Reconstruction of Human Bodies pdf

Recovering textured 3D models of non-rigid human body shapes is challenging due to self-occlusions caused by complex body poses and shapes, clothing obstructions, lack of surface texture, background clutter, sparse set of cameras with non-overlapping fields of view, etc. Further, a calibration-free environment adds additional complexity to both - reconstruction and texture recovery. In this paper, we propose a deep learning based solution for textured 3D reconstruction of human body shapes from a single view RGB image. This is achieved by first recovering the volumetric grid of the non-rigid human body given a single view RGB image followed by orthographic texture view synthesis using the respective depth projection of the reconstructed (volumetric) shape and input RGB image. We propose to co-learn the depth information readily available with affordable RGBD sensors (e.g., Kinect) while showing multiple views of the same object during the training phase. We show superior reconstruction performance in terms of quantitative and qualitative results, on both, publicly available datasets (by simulating the depth channel with virtual Kinect) as well as real RGBD data collected with our calibrated multi Kinect setup.

17.Scene Text Recognition from Two-Dimensional Perspective pdf

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In principle, directly compressing features of text into a one-dimensional form may lose useful information and introduce extra noise. In this paper, we approach scene text recognition from a two-dimensional perspective. A simple yet effective model, called Character Attention Fully Convolutional Network (CA-FCN), is devised for recognizing text of arbitrary shapes. Scene text recognition is realized with a semantic segmentation network, where an attention mechanism for characters is adopted. Combined with a word formation module, CA-FCN can simultaneously recognize the script and predict the position of each character. Experiments demonstrate that the proposed algorithm outperforms previous methods on both regular and irregular text datasets. Moreover, it is proven to be more robust to imprecise localizations in the text detection phase, which are very common in practice.

18.Mask Editor : an Image Annotation Tool for Image Segmentation Tasks pdf

Deep convolutional neural network (DCNN) is the state-of-the-art method for image segmentation, which is one of key challenging computer vision tasks. However, DCNN requires a lot of training images with corresponding image masks to get a good segmentation result. Image annotation software which is easy to use and allows fast image mask generation is in great demand. To the best of our knowledge, all existing image annotation software support only drawing bounding polygons, bounding boxes, or bounding ellipses to mark target objects. These existing software are inefficient when targeting objects that have irregular shapes (e.g., defects in fabric images or tire images). In this paper we design an easy-to-use image annotation software called Mask Editor for image mask generation. Mask Editor allows drawing any bounding curve to mark objects and improves efficiency to mark objects with irregular shapes. Mask Editor also supports drawing bounding polygons, drawing bounding boxes, drawing bounding ellipses, painting, erasing, super-pixel-marking, image cropping, multi-class masks, mask loading, and mask modifying.

19.LMap: Shape-Preserving Local Mappings for Biomedical Visualization pdf

Visualization of medical organs and biological structures is a challenging task because of their complex geometry and the resultant occlusions. Global spherical and planar mapping techniques simplify the complex geometry and resolve the occlusions to aid in visualization. However, while resolving the occlusions these techniques do not preserve the geometric context, making them less suitable for mission-critical biomedical visualization tasks. In this paper, we present a shape-preserving local mapping technique for resolving occlusions locally while preserving the overall geometric context. More specifically, we present a novel visualization algorithm, LMap, for conformally parameterizing and deforming a selected local region-of-interest (ROI) on an arbitrary surface. The resultant shape-preserving local mappings help to visualize complex surfaces while preserving the overall geometric context. The algorithm is based on the robust and efficient extrinsic Ricci flow technique, and uses the dynamic Ricci flow algorithm to guarantee the existence of a local map for a selected ROI on an arbitrary surface. We show the effectiveness and efficacy of our method in three challenging use cases: (1) multimodal brain visualization, (2) optimal coverage of virtual colonoscopy centerline flythrough, and (3) molecular surface visualization.

20.Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data pdf

Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. To bridge this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.

21.Radiative Transport Based Flame Volume Reconstruction from Videos pdf

We introduce a novel approach for flame volume reconstruction from videos using inexpensive charge-coupled device (CCD) consumer cameras. The approach includes an economical data capture technique using inexpensive CCD cameras. Leveraging the smear feature of the CCD chip, we present a technique for synchronizing CCD cameras while capturing flame videos from different views. Our reconstruction is based on the radiative transport equation which enables complex phenomena such as emission, extinction, and scattering to be used in the rendering process. Both the color intensity and temperature reconstructions are implemented using the CUDA parallel computing framework, which provides real-time performance and allows visualization of reconstruction results after every iteration. We present the results of our approach using real captured data and physically-based simulated data. Finally, we also compare our approach against the other state-of-the-art flame volume reconstruction methods and demonstrate the efficacy and efficiency of our approach in four different applications: (1) rendering of reconstructed flames in virtual environments, (2) rendering of reconstructed flames in augmented reality, (3) flame stylization, and (4) reconstruction of other semitransparent phenomena.

22.Crowd-Assisted Polyp Annotation of Virtual Colonoscopy Videos pdf

Virtual colonoscopy (VC) allows a radiologist to navigate through a 3D colon model reconstructed from a computed tomography scan of the abdomen, looking for polyps, the precursors of colon cancer. Polyps are seen as protrusions on the colon wall and haustral folds, visible in the VC fly-through videos. A complete review of the colon surface requires full navigation from the rectum to the cecum in antegrade and retrograde directions, which is a tedious task that takes an average of 30 minutes. Crowdsourcing is a technique for non-expert users to perform certain tasks, such as image or video annotation. In this work, we use crowdsourcing for the examination of complete VC fly-through videos for polyp annotation by non-experts. The motivation for this is to potentially help the radiologist reach a diagnosis in a shorter period of time, and provide a stronger confirmation of the eventual diagnosis. The crowdsourcing interface includes an interactive tool for the crowd to annotate suspected polyps in the video with an enclosing box. Using our workflow, we achieve an overall polyps-per-patient sensitivity of 87.88% (95.65% for polyps $\geq$5mm and 70% for polyps $<$5mm). We also demonstrate the efficacy and effectiveness of a non-expert user in detecting and annotating polyps and discuss their possibility in aiding radiologists in VC examinations.

23.Crowdsourcing Lung Nodules Detection and Annotation pdf

We present crowdsourcing as an additional modality to aid radiologists in the diagnosis of lung cancer from clinical chest computed tomography (CT) scans. More specifically, a complete workflow is introduced which can help maximize the sensitivity of lung nodule detection by utilizing the collective intelligence of the crowd. We combine the concept of overlapping thin-slab maximum intensity projections (TS-MIPs) and cine viewing to render short videos that can be outsourced as an annotation task to the crowd. These videos are generated by linearly interpolating overlapping TS-MIPs of CT slices through the depth of each quadrant of a patient's lung. The resultant videos are outsourced to an online community of non-expert users who, after a brief tutorial, annotate suspected nodules in these video segments. Using our crowdsourcing workflow, we achieved a lung nodule detection sensitivity of over 90% for 20 patient CT datasets (containing 178 lung nodules with sizes between 1-30mm), and only 47 false positives from a total of 1021 annotations on nodules of all sizes (96% sensitivity for nodules$>$4mm). These results show that crowdsourcing can be a robust and scalable modality to aid radiologists in screening for lung cancer, directly or in combination with computer-aided detection (CAD) algorithms. For CAD algorithms, the presented workflow can provide highly accurate training data to overcome the high false-positive rate (per scan) problem. We also provide, for the first time, analysis on nodule size and position which can help improve CAD algorithms.

24.Segmenting root systems in X-ray computed tomography images using level sets pdf

The segmentation of plant roots from soil and other growing media in X-ray computed tomography images is needed to effectively study the root system architecture without excavation. However, segmentation is a challenging problem in this context because the root and non-root regions share similar features. In this paper, we describe a method based on level sets and specifically adapted for this segmentation problem. In particular, we deal with the issues of using a level sets approach on large image volumes for root segmentation, and track active regions of the front using an occupancy grid. This method allows for straightforward modifications to a narrow-band algorithm such that excessive forward and backward movements of the front can be avoided, distance map computations in a narrow band context can be done in linear time through modification of Meijster et al.'s distance transform algorithm, and regions of the image volume are iteratively used to estimate distributions for root versus non-root classes. Results are shown of three plant species of different maturity levels, grown in three different media. Our method compares favorably to a state-of-the-art method for root segmentation in X-ray CT image volumes.

25.Déjà Vu: an empirical evaluation of the memorization properties of ConvNets pdf

Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used to train a model, and in particular whether some validation images were used at train time. We then analyze explicit memorization and extend classical random label experiments to the problem of learning a model that predicts if an image belongs to an arbitrary set. Finally, we propose a new approach to infer membership when a few of the top layers are not available or have been fine-tuned, and show that lower layers still carry information about the training samples. To support our findings, we conduct large-scale experiments on Imagenet and subsets of YFCC-100M with modern architectures such as VGG and Resnet.

26.Visual Diagnostics for Deep Reinforcement Learning Policy Development pdf

Modern vision-based reinforcement learning techniques often use convolutional neural networks (CNN) as universal function approximators to choose which action to take for a given visual input. Until recently, CNNs have been treated like black-box functions, but this mindset is especially dangerous when used for control in safety-critical settings. In this paper, we present our extensions of CNN visualization algorithms to the domain of vision-based reinforcement learning. We use a simulated drone environment as an example scenario. These visualization algorithms are an important tool for behavior introspection and provide insight into the qualities and flaws of trained policies when interacting with the physical world. A video may be seen at this https URL

27.Adding Cues to Binary Feature Descriptors for Visual Place Recognition pdf

In this paper we propose an approach to embed continuous and selector cues in binary feature descriptors used for visual place recognition. The embedding is achieved by extending each feature descriptor with a binary string that encodes a cue and supports the Hamming distance metric. Augmenting the descriptors in such a way has the advantage of being transparent to the procedure used to compare them. We present two concrete applications of our methodology, demonstrating the two considered types of cues. In addition to that, we conducted on these applications a broad quantitative and comparative evaluation covering five benchmark datasets and several state-of-the-art image retrieval approaches in combination with various binary descriptor types.

28.Scattering Networks for Hybrid Representation Learning pdf

Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we demonstrate that the early layers of CNNs do not necessarily need to be learned, and can be replaced with a scattering network instead. Indeed, using hybrid architectures, we achieve the best results with predefined representations to-date, while being competitive with end-to-end learned CNNs. Specifically, even applying a shallow cascade of small-windowed scattering coefficients followed by 1$\times$1-convolutions results in AlexNet accuracy on the ILSVRC2012 classification task. Moreover, by combining scattering networks with deep residual networks, we achieve a single-crop top-5 error of 11.4% on ILSVRC2012. Also, we show they can yield excellent performance in the small sample regime on CIFAR-10 and STL-10 datasets, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. For unsupervised learning, scattering coefficients can be a competitive representation that permits image recovery. We use this fact to train hybrid GANs to generate images. Finally, we empirically analyze several properties related to stability and reconstruction of images from scattering coefficients.