Skip to content

Latest commit

 

History

History
63 lines (63 loc) · 38.4 KB

20181107.md

File metadata and controls

63 lines (63 loc) · 38.4 KB

ArXiv cs.CV --Wed, 7 Nov 2018

1.Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond pdf

We propose 'Hide-and-Seek' a general purpose data augmentation technique, which is complementary to existing data augmentation techniques and is beneficial for various visual recognition tasks. The key idea is to hide patches in a training image randomly, in order to force the network to seek other relevant content when the most discriminative content is hidden. Our approach only needs to modify the input image and can work with any network to improve its performance. During testing, it does not need to hide any patches. The main advantage of Hide-and-Seek over existing data augmentation techniques is its ability to improve object localization accuracy in the weakly-supervised setting, and we therefore use this task to motivate the approach. However, Hide-and-Seek is not tied only to the image localization task, and can generalize to other forms of visual input like videos, as well as other recognition tasks like image classification, temporal action localization, semantic segmentation, emotion recognition, age/gender estimation, and person re-identification. We perform extensive experiments to showcase the advantage of Hide-and-Seek on these various visual recognition problems.

2.Deep feature transfer between localization and segmentation tasks pdf

In this paper, we propose a new pre-training scheme for U-net based image segmentation. We first train the encoding arm as a localization network to predict the center of the target, before extending it into a U-net architecture for segmentation. We apply our proposed method to the problem of segmenting the optic disc from fundus photographs. Our work shows that the features learned by encoding arm can be transferred to the segmentation network to reduce the annotation burden. We propose that an approach could have broad utility for medical image segmentation, and alleviate the burden of delineating complex structures by pre-training on annotations that are much easier to acquire.

3.Evolvement Constrained Adversarial Learning for Video Style Transfer pdf

Video style transfer is a useful component for applications such as augmented reality, non-photorealistic rendering, and interactive games. Many existing methods use optical flow to preserve the temporal smoothness of the synthesized video. However, the estimation of optical flow is sensitive to occlusions and rapid motions. Thus, in this work, we introduce a novel evolve-sync loss computed by evolvements to replace optical flow. Using this evolve-sync loss, we build an adversarial learning framework, termed as Video Style Transfer Generative Adversarial Network (VST-GAN), which improves upon the MGAN method for image style transfer for more efficient video style transfer. We perform extensive experimental evaluations of our method and show quantitative and qualitative improvements over the state-of-the-art methods.

4.Convolutional LSTMs for Cloud-Robust Segmentation of Remote Sensing Imagery pdf

Dynamic spatiotemporal processes on the Earth can be observed by an increasing number of optical Earth observation satellites that measure spectral reflectance at multiple spectral bands in regular intervals. Clouds partially covering the surface is an omnipresent challenge for the majority of remote sensing approaches that are not robust regarding cloud coverage. In these approaches, clouds are typically handled by cherry-picking cloud-free observations or by pre-classification of cloudy pixels and subsequent masking. In this work, we demonstrate the robustness of a straightforward convolutional long short-term memory network for vegetation classification using all available cloudy and non-cloudy satellite observations. We visualize the internal gate activations within the recurrent cells and find that, in some cells, modulation and input gates close on cloudy pixels. This indicates that the network has internalized a cloud-filtering mechanism without being specifically trained on cloud labels. The robustness regarding clouds is further demonstrated by experiments on sequences with varying degrees of cloud coverage where our network achieved similar accuracies on all cloudy and non-cloudy datasets. Overall, our results question the necessity of sophisticated pre-processing pipelines if robust classification methods are utilized.

5.Multi-Level Sensor Fusion with Deep Learning pdf

In the context of deep learning, this article presents an original deep network, namely CentralNet, for the fusion of information coming from different sensors. This approach is designed to efficiently and automatically balance the trade-off between early and late fusion (i.e. between the fusion of low-level vs high-level information). More specifically, at each level of abstraction-the different levels of deep networks-uni-modal representations of the data are fed to a central neural network which combines them into a common embedding. In addition, a multi-objective regularization is also introduced, helping to both optimize the central network and the unimodal networks. Experiments on four multimodal datasets not only show state-of-the-art performance, but also demonstrate that CentralNet can actually choose the best possible fusion strategy for a given problem.

6.Low-Rank Tensor Modeling for Hyperspectral Unmixing Accounting for Spectral Variability pdf

Traditional hyperspectral unmixing methods neglect the underlying variability of spectral signatures often obeserved in typical hyperspectral images, propagating these missmodeling errors throughout the whole unmixing process. Attempts to model material spectra as members of sets or as random variables tend to lead to severely ill-posed unmixing problems. To overcome this drawback, endmember variability has been handled through generalizations of the mixing model, combined with spatial regularization over the abundance and endmember estimations. Recently, tensor-based strategies considered low-rank decompositions of hyperspectral images as an alternative to impose low-dimensional structures on the solutions of standard and multitemporal unmixing problems. These strategies, however, present two main drawbacks: 1) they confine the solutions to low-rank tensors, which often cannot represent the complexity of real-world scenarios; and 2) they lack guarantees that endmembers and abundances will be correctly factorized in their respective tensors. In this work, we propose a more flexible approach, called ULTRA-V, that imposes low-rank structures through regularizations whose strictness is controlled by scalar parameters. Simulations attest the superior accuracy of the method when compared with state-of-the-art unmixing algorithms that account for spectral variability.

7.A `Little Bit' Too Much? High Speed Imaging from Sparse Photon Counts pdf

Recent advances in photographic sensing technologies have made it possible to achieve light detection in terms of a single photon. Photon counting sensors are being increasingly used in many diverse applications. We address the problem of jointly recovering spatial and temporal scene radiance from very few photon counts. Our ConvNet-based scheme effectively combines spatial and temporal information present in measurements to reduce noise. We demonstrate that using our method one can acquire videos at a high frame rate and still achieve good quality signal-to-noise ratio. Experiments show that the proposed scheme performs quite well in different challenging scenarios while the existing denoising schemes are unable to handle them.

8.Fine-grained Apparel Classification and Retrieval without rich annotations pdf

The ability to correctly classify and retrieve apparel images has a variety of applications important to e-commerce, online advertising and internet search. In this work, we propose a robust framework for fine-grained apparel classification, in-shop and cross-domain retrieval which eliminates the requirement of rich annotations like bounding boxes and human-joints or clothing landmarks, and training of bounding box/ key-landmark detector for the same. Factors such as subtle appearance differences, variations in human poses, different shooting angles, apparel deformations, and self-occlusion add to the challenges in classification and retrieval of apparel items. Cross-domain retrieval is even harder due to the presence of large variation between online shopping images, usually taken in ideal lighting, pose, positive angle and clean background as compared with street photos captured by users in complicated conditions with poor lighting and cluttered scenes. Our framework uses compact bilinear CNN with tensor sketch algorithm to generate embeddings that capture local pairwise feature interactions in a translationally invariant manner. For apparel classification, we pass the feature embeddings through a softmax classifier, while, the in-shop and cross-domain retrieval pipelines use a triplet-loss based optimization approach, such that squared Euclidean distance between embeddings measures the dissimilarity between the images. Unlike previous works that relied on bounding box, key clothing landmarks or human joint detectors to assist the final deep classifier, proposed framework can be trained directly on the provided category labels or generated triplets for triplet loss optimization. Lastly, Experimental results on the DeepFashion fine-grained categorization, and in-shop and consumer-to-shop retrieval datasets provide a comparative analysis with previous work performed in the domain.

9.Sets of autoencoders with shared latent spaces pdf

Autoencoders receive latent models of input data. It was shown in recent works that they also estimate probability density functions of the input. This fact makes using the Bayesian decision theory possible. If we obtain latent models of input data for each class or for some points in the space of parameters in a parameter estimation task, we are able to estimate likelihood functions for those classes or points in parameter space. We show how the set of autoencoders solves the recognition problem. Each autoencoder describes its own model or context, a latent vector that presents input data in the latent space may be called treatment in its context. Sharing latent spaces of autoencoders gives a very important property that is the ability to separate treatment and context where the input information is treated through the set of autoencoders. There are two remarkable and most valuable results of this work: a mechanism that shows a possible way of forming abstract concepts and a way of reducing dataset's size during training. These results are confirmed by tests presented in the article.

10.Identificação automática de pichação a partir de imagens urbanas pdf

Graffiti tagging is a common issue in great cities an local authorities are on the move to combat it. The tagging map of a city can be a useful tool as it may help to clean-up highly saturated regions and discourage future acts in the neighbourhood and currently there is no way of getting a tagging map of a region in an automatic fashion and manual inspection or crowd participation are required. In this work, we describe a work in progress in creating an automatic way to get a tagging map of a city or region. It is based on the use of street view images and on the detection of graffiti tags in the images.

11.Fast High-Dimensional Bilateral and Nonlocal Means Filtering pdf

Existing fast algorithms for bilateral and nonlocal means filtering mostly work with grayscale images. They cannot easily be extended to high-dimensional data such as color and hyperspectral images, patch-based data, flow-fields, etc. In this paper, we propose a fast algorithm for high-dimensional bilateral and nonlocal means filtering. Unlike existing approaches, where the focus is on approximating the data (using quantization) or the filter kernel (via analytic expansions), we locally approximate the kernel using weighted and shifted copies of a Gaussian, where the weights and shifts are inferred from the data. The algorithm emerging from the proposed approximation essentially involves clustering and fast convolutions, and is easy to implement. Moreover, a variant of our algorithm comes with a guarantee (bound) on the approximation error, which is not enjoyed by existing algorithms. We present some results for high-dimensional bilateral and nonlocal means filtering to demonstrate the speed and accuracy of our proposal. Moreover, we also show that our algorithm can outperform state-of-the-art fast approximations in terms of accuracy and timing.

12.Micro-Attention for Micro-Expression recognition pdf

Micro-expression, for its high objectivity in emotion detection, has emerged to be a promising modality in affective computing. Recently, deep learning methods have been successfully introduced into micro-expression recognition areas. Whilst the higher recognition accuracy achieved with deep learning methods, substantial challenges in micro-expression recognition remain. Issues with the existence of micro expression in small-local areas on face and limited size of databases still constrain the recognition accuracy of such facial behavior. In this work, to tackle such challenges, we propose novel attention mechanism called micro-attention cooperating with residual network. Micro-attention enables the network to learn to focus on facial area of interest. Moreover, coping with small datasets, a simple yet efficient transfer learning approach is utilized to alleviate the overfitting risk. With an extensive experimental evaluation on two benchmarks (CASMEII, SAMM), we demonstrate the effectiveness of proposed micro-attention and push the boundary of automatic recognition of micro-expression.

13.Object 3D Reconstruction based on Photometric Stereo and Inverted Rendering pdf

Methods for 3D reconstruction such as Photometric stereo recover the shape and reflectance properties using multiple images of an object taken with variable lighting conditions from a fixed viewpoint. Photometric stereo assumes that a scene is illuminated only directly by the illumination source. As result, indirect illumination effects due to inter-reflections introduce strong biases in the recovered shape. Our suggested approach is to recover scene properties in the presence of indirect illumination. To this end, we proposed an iterative PS method combined with a reverted Monte-Carlo ray tracing algorithm to overcome the inter-reflection effects aiming to separate the direct and indirect lighting. This approach iteratively reconstructs a surface considering both the environment around the object and its concavities. We demonstrate and evaluate our approach using three datasets and the overall results illustrate improvement over the classic PS approaches.

14.Super-Identity Convolutional Neural Network for Face Hallucination pdf

Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information. However, previous face hallucination approaches largely ignore facial identity recovery. This paper proposes Super-Identity Convolutional Neural Network (SICNN) to recover identity information for generating faces closed to the real identity. Specifically, we define a super-identity loss to measure the identity difference between a hallucinated face and its corresponding high-resolution face within the hypersphere identity metric space. However, directly using this loss will lead to a Dynamic Domain Divergence problem, which is caused by the large margin between the high-resolution domain and the hallucination domain. To overcome this challenge, we present a domain-integrated training approach by constructing a robust identity metric for faces from these two domains. Extensive experimental evaluations demonstrate that the proposed SICNN achieves superior visual quality over the state-of-the-art methods on a challenging task to super-resolve 12$\times$14 faces with an 8$\times$ upscaling factor. In addition, SICNN significantly improves the recognizability of ultra-low-resolution faces.

15.Fast Adaptive Bilateral Filtering pdf

In the classical bilateral filter, a fixed Gaussian range kernel is used along with a spatial kernel for edge-preserving smoothing. We consider a generalization of this filter, the so-called adaptive bilateral filter, where the center and width of the Gaussian range kernel is allowed to change from pixel to pixel. Though this variant was originally proposed for sharpening and noise removal, it can also be used for other applications such as artifact removal and texture filtering. Similar to the bilateral filter, the brute-force implementation of its adaptive counterpart requires intense computations. While several fast algorithms have been proposed in the literature for bilateral filtering, most of them work only with a fixed range kernel. In this paper, we propose a fast algorithm for adaptive bilateral filtering, whose complexity does not scale with the spatial filter width. This is based on the observation that the concerned filtering can be performed purely in range space using an appropriately defined local histogram. We show that by replacing the histogram with a polynomial and the finite range-space sum with an integral, we can approximate the filter using analytic functions. In particular, an efficient algorithm is derived using the following innovations: the polynomial is fitted by matching its moments to those of the target histogram (this is done using fast convolutions), and the analytic functions are recursively computed using integration-by-parts. Our algorithm can accelerate the brute-force implementation by at least $20 \times$, without perceptible distortions in the visual quality. We demonstrate the effectiveness of our algorithm for sharpening, JPEG deblocking, and texture filtering.

16.Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning pdf

Driving Scene understanding is a key ingredient for intelligent transportation systems. To achieve systems that can operate in a complex physical and social environment, they need to understand and learn how humans drive and interact with traffic scenes. We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors. We provide a detailed analysis of HDD with a comparison to other driving datasets. A novel annotation methodology is introduced to enable research on driver behavior understanding from untrimmed data sequences. As the first step, baseline algorithms for driver behavior detection are trained and tested to demonstrate the feasibility of the proposed task.

17.Infrared and visible image fusion using a novel deep decomposition method pdf

Infrared and visible image fusion is an important problem in image fusion tasks which has been applied widely in many fields. To better preserve the useful information from source images, in this paper, we propose an effective image fusion framework using a novel deep decomposition method which based on Latent Low-Rank Representation(LatLRR). And this decomposition method is also named DDLatLRR. Firstly, the LatLRR is utilized to learn a project matrix which used to extract salient features. Then, the base part and multi-level detail parts are obtained by DDLatLRR. With adaptive fusion strategies, the fused base part and the fused detail parts are reconstructed. Finally, the fused image is obtained by combine the fused base part and the detail parts. Compared with other fusion methods experimentally, the proposed algorithm has better fusion performance than state-of-the-art fusion methods in both subjective and objective evaluation. The Code of our fusion method is available at this https URL

18.SparseFool: a few pixels make a big difference pdf

Deep Neural Networks have achieved extraordinary results on image classification tasks, but have been shown to be vulnerable to attacks with carefully crafted perturbations of the input data. Although most attacks usually change values of many image's pixels, it has been shown that deep networks are also vulnerable to sparse alterations of the input. However, no \textit{efficient} method has been proposed to compute sparse perturbations. In this paper, we exploit the low mean curvature of the decision boundary, and propose SparseFool, a geometry inspired sparse attack that controls the sparsity of the perturbations. Extensive evaluations show that our approach outperforms related methods, and scales to high dimensional data. We further analyze the transferability and the visual effects of the perturbations, and show the existence of shared semantic information across the images and the networks. Finally, we show that adversarial training using $\ell_\infty$ perturbations can slightly improve the robustness against sparse additive perturbations.

19.Semantic bottleneck for computer vision tasks pdf

This paper introduces a novel method for the representation of images that is semantic by nature, addressing the question of computation intelligibility in computer vision tasks. More specifically, our proposition is to introduce what we call a semantic bottleneck in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language , while retaining the efficiency of numerical representations. We show that our approach is able to generate semantic representations that give state-of-the-art results on semantic content-based image retrieval and also perform very well on image classification tasks. Intelligibility is evaluated through user centered experiments for failure detection.

20.Weakly Supervised Scene Parsing with Point-based Distance Metric Learning pdf

Semantic scene parsing is suffering from the fact that pixel-level annotations are hard to be collected. To tackle this issue, we propose a Point-based Distance Metric Learning (PDML) in this paper. PDML does not require dense annotated masks and only leverages several labeled points that are much easier to obtain to guide the training process. Concretely, we leverage semantic relationship among the annotated points by encouraging the feature representations of the intra- and inter-category points to keep consistent, i.e. points within the same category should have more similar feature representations compared to those from different categories. We formulate such a characteristic into a simple distance metric loss, which collaborates with the point-wise cross-entropy loss to optimize the deep neural networks. Furthermore, to fully exploit the limited annotations, distance metric learning is conducted across different training images instead of simply adopting an image-dependent manner. We conduct extensive experiments on two challenging scene parsing benchmarks of PASCAL-Context and ADE 20K to validate the effectiveness of our PDML, and competitive mIoU scores are achieved.

21.DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking pdf

In recent years, Discriminative Correlation Filter (DCF) based tracking methods have achieved great success in visual tracking. However, the multi-resolution convolutional feature maps trained from other tasks like image classification, cannot be naturally used in the conventional DCF formulation. Furthermore, these high-dimensional feature maps significantly increase the tracking complexity and thus limit the tracking speed. In this paper, we present a deep and shallow feature learning network, namely DSNet, to learn the multi-level same-resolution compressed (MSC) features for efficient online tracking, in an end-to-end offline manner. Specifically, the proposed DSNet compresses multi-level convolutional features to uniform spatial resolution features. The learned MSC features effectively encode both appearance and semantic information of objects in the same-resolution feature maps, thus enabling an elegant combination of the MSC features with any DCF-based methods. Additionally, a channel reliability measurement (CRM) method is presented to further refine the learned MSC features. We demonstrate the effectiveness of the MSC features learned from the proposed DSNet on two DCF tracking frameworks: the basic DCF framework and the continuous convolution operator framework. Extensive experiments show that the learned MSC features have the appealing advantage of allowing the equipped DCF-based tracking methods to perform favorably against the state-of-the-art methods while running at high frame rates.

22.In-the-wild Facial Expression Recognition in Extreme Poses pdf

In the computer research area, facial expression recognition is a hot research problem. Recent years, the research has moved from the lab environment to in-the-wild circumstances. It is challenging, especially under extreme poses. But current expression detection systems are trying to avoid the pose effects and gain the general applicable ability. In this work, we solve the problem in the opposite approach. We consider the head poses and detect the expressions within special head poses. Our work includes two parts: detect the head pose and group it into one pre-defined head pose class; do facial expression recognize within each pose class. Our experiments show that the recognition results with pose class grouping are much better than that of direct recognition without considering poses. We combine the hand-crafted features, SIFT, LBP and geometric feature, with deep learning feature as the representation of the expressions. The handcrafted features are added into the deep learning framework along with the high level deep learning features. As a comparison, we implement SVM and random forest to as the prediction models. To train and test our methodology, we labeled the face dataset with 6 basic expressions.

23.3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds pdf

This paper introduces the 3DCapsule, which is a 3D extension of the recently introduced Capsule concept that makes it applicable to unordered point sets. The original Capsule relies on the existence of a spatial relationship between the elements in the feature map it is presented with, whereas in point permutation invariant formulations of 3D point set classification methods, such relationships are typically lost. Here, a new layer called ComposeCaps is introduced that, in lieu of a spatially relevant feature mapping, learns a new mapping that can be exploited by the 3DCapsule. Previous works in the 3D point set classification domain have focused on other parts of the architecture, whereas instead, the 3DCapsule is a drop-in replacement of the commonly used fully connected classifier. It is demonstrated via an ablation study, that when the 3DCapsule is applied to recent 3D point set classification architectures, it consistently shows an improvement, in particular when subjected to noisy data. Similarly, the ComposeCaps layer is evaluated and demonstrates an improvement over the baseline. In an apples-to-apples comparison against state-of-the-art methods, again, better performance is demonstrated by the 3DCapsule.

24.BLP - Boundary Likelihood Pinpointing Networks for Accurate Temporal Action Localization pdf

Despite tremendous progress achieved in temporal action detection, state-of-the-art methods still suffer from the sharp performance deterioration when localizing the starting and ending temporal action boundaries. Although most methods apply boundary regression paradigm to tackle this problem, we argue that the direct regression lacks detailed enough information to yield accurate temporal boundaries. In this paper, we propose a novel Boundary Likelihood Pinpointing (BLP) network to alleviate this deficiency of boundary regression and improve the localization accuracy. Given a loosely localized search interval that contains an action instance, BLP casts the problem of localizing temporal boundaries as that of assigning probabilities on each equally divided unit of this interval. These generated probabilities provide useful information regarding the boundary location of the action inside this search interval. Based on these probabilities, we introduce a boundary pinpointing paradigm to pinpoint the accurate boundaries under a simple probabilistic framework. Compared with other C3D feature based detectors, extensively experiments demonstrate that BLP significantly improve the localization performance of recent state-of-the-art detectors, and achieve competitive detection mAP on both THUMOS' 14 and ActivityNet datasets, particularly when the evaluation tIoU is high.

25.Classification of 12-Lead ECG Signals with Bi-directional LSTM Network pdf

We propose a recurrent neural network classifier to detect pathologies in 12-lead ECG signals and train and validate the classifier with the Chinese physiological signal challenge dataset (this http URL). The recurrent neural network consists of two bi-directional LSTM layers and can train on arbitrary-length ECG signals. Our best trained model achieved an average F1 score of 74.15% on the validation set.
Keywords: ECG classification, Deep learning, RNN, Bi-directional LSTM, QRS detection.

26.Leveraging Virtual and Real Person for Unsupervised Person Re-identification pdf

Person re-identification (re-ID) is a challenging problem especially when no labels are available for training. Although recent deep re-ID methods have achieved great improvement, it is still difficult to optimize deep re-ID model without annotations in training data. To address this problem, this study introduces a novel approach for unsupervised person re-ID by leveraging virtual and real data. Our approach includes two components: virtual person generation and training of deep re-ID model. For virtual person generation, we learn a person generation model and a camera style transfer model using unlabeled real data to generate virtual persons with different poses and camera styles. The virtual data is formed as labeled training data, enabling subsequently training deep re-ID model in supervision. For training of deep re-ID model, we divide it into three steps: 1) pre-training a coarse re-ID model by using virtual data; 2) collaborative filtering based positive pair mining from the real data; and 3) fine-tuning of the coarse re-ID model by leveraging the mined positive pairs and virtual data. The final re-ID model is achieved by iterating between step 2 and step 3 until convergence. Experimental results on two large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate the effectiveness of our approach and shows that the state of the art is achieved in unsupervised person re-ID.

27.Non-Local Compressive Sensing Based SAR Tomography pdf

Tomographic SAR (TomoSAR) inversion of urban areas is an inherently sparse reconstruction problem and, hence, can be solved using compressive sensing (CS) algorithms. This paper proposes solutions for two notorious problems in this field: 1) TomoSAR requires a high number of data sets, which makes the technique expensive. However, it can be shown that the number of acquisitions and the signal-to-noise ratio (SNR) can be traded off against each other, because it is asymptotically only the product of the number of acquisitions and SNR that determines the reconstruction quality. We propose to increase SNR by integrating non-local estimation into the inversion and show that a reasonable reconstruction of buildings from only seven interferograms is feasible. 2) CS-based inversion is computationally expensive and therefore barely suitable for large-scale applications. We introduce a new fast and accurate algorithm for solving the non-local L1-L2-minimization problem, central to CS-based reconstruction algorithms. The applicability of the algorithm is demonstrated using simulated data and TerraSAR-X high-resolution spotlight images over an area in Munich, Germany.

28.A Differential Volumetric Approach to Multi-View Photometric Stereo pdf

Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulties are usually related to merging rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric imaging 3D shape reconstruction techniques. However, beside the intrinsic difficulties that multi-view stereo and photometric stereo have to make them working reliably, supplementary problems raise when considered together. Most importantly, the projection of the fine details usually retrievable with photometric stereo onto the rough multi-view stereo reconstruction is difficult to handle.
In this work, we present a volumetric approach to the multi-view photometric stereo problem defined by a unified differential model. The key to our method is the signed distance field parameterisation which avoids the complex step of re-projecting high frequency details as the parameterisation of the whole volume allows a photometric modeling on the volume itself efficiently dealing with occlusions, discontinuities, etc. The relation between the surface normals and the gradient of the signed distance field leads to a homogeneous linear partial differential equation. A variational optimisation is adopted in order to combine multiple images from multiple points of view in a single system avoiding the need of merging depth maps. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.

29.Towards continual learning in medical imaging pdf

This work investigates continual learning of two segmentation tasks in brain MRI with neural networks. To explore in this context the capabilities of current methods for countering catastrophic forgetting of the first task when a new one is learned, we investigate elastic weight consolidation, a recently proposed method based on Fisher information, originally evaluated on reinforcement learning of Atari games. We use it to sequentially learn segmentation of normal brain structures and then segmentation of white matter lesions. Our findings show this recent method reduces catastrophic forgetting, while large room for improvement exists in these challenging settings for continual learning.

30.Synaptic Strength For Convolutional Neural Network pdf

Convolutional Neural Networks(CNNs) are both computation and memory intensive which hindered their deployment in mobile devices. Inspired by the relevant concept in neural science literature, we propose Synaptic Pruning: a data-driven method to prune connections between input and output feature maps with a newly proposed class of parameters called Synaptic Strength. Synaptic Strength is designed to capture the importance of a connection based on the amount of information it transports. Experiment results show the effectiveness of our approach. On CIFAR-10, we prune connections for various CNN models with up to 96% , which results in significant size reduction and computation saving. Further evaluation on ImageNet demonstrates that synaptic pruning is able to discover efficient models which is competitive to state-of-the-art compact CNNs such as MobileNet-V2 and NasNet-Mobile. Our contribution is summarized as following: (1) We introduce Synaptic Strength, a new class of parameters for CNNs to indicate the importance of each connections. (2) Our approach can prune various CNNs with high compression without compromising accuracy. (3) Further investigation shows, the proposed Synaptic Strength is a better indicator for kernel pruning compared with the previous approach in both empirical result and theoretical analysis.

31.A General Theory of Equivariant CNNs on Homogeneous Spaces pdf

Group equivariant convolutional neural networks (G-CNNs) have recently emerged as a very effective model class for learning from signals in the context of known symmetries. A wide variety of equivariant layers has been proposed for signals on 2D and 3D Euclidean space, graphs, and the sphere, and it has become difficult to see how all of these methods are related, and how they may be generalized.
In this paper, we present a fairly general theory of equivariant convolutional networks. Convolutional feature spaces are described as fields over a homogeneous base space, such as the plane $\mathbb{R}^2$, sphere $S^2$ or a graph $\mathcal{G}$. The theory enables a systematic classification of all existing G-CNNs in terms of their group of symmetry, base space, and field type (e.g. scalar, vector, or tensor field, etc.).
In addition to this classification, we use Mackey theory to show that convolutions with equivariant kernels are the most general class of equivariant maps between such fields, thus establishing G-CNNs as a universal class of equivariant networks. The theory also explains how the space of equivariant kernels can be parameterized for learning, thereby simplifying the development of G-CNNs for new spaces and symmetries. Finally, the theory introduces a rich geometric semantics to learned feature spaces, thus improving interpretability of deep networks, and establishing a connection to central ideas in mathematics and physics.