Skip to content

Latest commit

 

History

History
59 lines (59 loc) · 36.7 KB

20200131.md

File metadata and controls

59 lines (59 loc) · 36.7 KB

ArXiv cs.CV --Fri, 31 Jan 2020

1.Estimating and abstracting the 3D structure of bones using neural networks on X-ray (2D) images ⬇️

In this paper, we present a deep-learning based method for estimating the 3D structure of a bone from a pair of 2D X-ray images. Our triplet loss-trained neural network selects the most closely matching 3D bone shape from a predefined set of shapes. Our predictions have an average root mean square (RMS) distance of 1.08 mm between the predicted and true shapes, making it more accurate than the average error achieved by eight other examined 3D bone reconstruction approaches. The prediction process that we use is fully automated and unlike many competing approaches, it does not rely on any previous knowledge about bone geometry. Additionally, our neural network can determine the identity of a bone based only on its X-ray image. It computes a low-dimensional representation ("embedding") of each 2D X-ray image and henceforth compares different X-ray images based only on their embeddings. An embedding holds enough information to uniquely identify the bone CT belonging to the input X-ray image with a 100% accuracy and can therefore serve as a kind of fingerprint for that bone. Possible applications include faster, image content-based bone database searches for forensic purposes.

2.ERA: A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos ⬇️

Along with the increasing use of unmanned aerial vehicles (UAVs), large volumes of aerial videos have been produced. It is unrealistic for humans to screen such big data and understand their contents. Hence methodological research on the automatic understanding of UAV videos is of paramount importance. In this paper, we introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community and present a large-scale, human-annotated dataset, named ERA (Event Recognition in Aerial videos), consisting of 2,866 videos each with a label from 25 different classes corresponding to an event unfolding 5 seconds. The ERA dataset is designed to have a significant intra-class variation and inter-class similarity and captures dynamic events in various circumstances and at dramatically various scales. Moreover, to offer a benchmark for this task, we extensively validate existing deep networks. We expect that the ERA dataset will facilitate further progress in automatic aerial video comprehension. The website is this https URL

3.A Deeper Look into Hybrid Images ⬇️

$Hybrid$ $images$ was first introduced by Olivia et al., that produced static images with two interpretations such that the images changes as a function of viewing distance. Hybrid images are built by studying human processing of multiscale images and are motivated by masking studies in visual perception. The first introduction of hybrid images showed that two images can be blend together with a high pass filter and a low pass filter in such a way that when the blended image is viewed from a distance, the high pass filter fades away and the low pass filter becomes prominent. Our main aim here is to study and review the original paper by changing and tweaking certain parameters to see how they affect the quality of the blended image produced. We have used exhaustively different set of images and filters to see how they function and whether this can be used in a real time system or not.

4.The Ladder Algorithm: Finding Repetitive Structures in Medical Images by Induction ⬇️

In this paper we introduce the Ladder Algorithm; a novel recurrent algorithm to detect repetitive structures in natural images with high accuracy using little training data.
We then demonstrate the algorithm on the task of extracting vertebrae from whole spine magnetic resonance scans with only lumbar MR scans for training data. It is shown to achieve high perforamance with 99.8% precision and recall, exceeding current state of the art approaches for lumbar vertebrae detection in T1 and T2 weighted scans. It also generalises without retraining to whole spine images with minimal drop in accuracy, achieving 99.4% detection rate.

5.Weakly Supervised Segmentation of Cracks on Solar Cells using Normalized Lp Norm ⬇️

Photovoltaic is one of the most important renewable energy sources for dealing with world-wide steadily increasing energy consumption. This raises the demand for fast and scalable automatic quality management during production and operation. However, the detection and segmentation of cracks on electroluminescence (EL) images of mono- or polycrystalline solar modules is a challenging task. In this work, we propose a weakly supervised learning strategy that only uses image-level annotations to obtain a method that is capable of segmenting cracks on EL images of solar cells. We use a modified ResNet-50 to derive a segmentation from network activation maps. We use defect classification as a surrogate task to train the network. To this end, we apply normalized Lp normalization to aggregate the activation maps into single scores for classification. In addition, we provide a study how different parameterizations of the normalized Lp layer affect the segmentation performance. This approach shows promising results for the given task. However, we think that the method has the potential to solve other weakly supervised segmentation problems as well.

6.Fast Video Object Segmentation using the Global Context Module ⬇️

We developed a real-time, high-quality video object segmentation algorithm for semi-supervised video segmentation. Its performance is on par with the most accurate, time-consuming online-learning model, while its speed is similar to the fastest template-matching method which has sub-optimal accuracy. The core in achieving this is a novel global context module that reliably summarizes and propagates information through the entire video. Compared to previous approaches that only use the first, the last, or a select few frames to guide the segmentation of the current frame, the global context module allows us to use all past frames to guide the processing. Unlike the state-of-the-art space-time memory network that caches a memory at each spatiotemporal position, our global context module is a fixed-size representation that does not use more memory as more frames are processed. It is straightforward in implementation and has lower memory and computational costs than the space-time memory module. Equipped with the global context module, our method achieved top accuracy on DAVIS 2016 and near-state-of-the-art results on DAVIS 2017 at a real-time speed.

7.Weakly Supervised Instance Segmentation by Deep Multi-Task Community Learning ⬇️

We present an object segmentation algorithm based on community learning for multiple tasks under the supervision of image-level class labels only, where individual instances of the same class are identified and segmented separately. This problem is formulated as a combination of weakly supervised object detection and semantic segmentation and is addressed by designing a unified deep neural network architecture, which has a positive feedback loop of object detection with bounding box regression, instance mask generation, instance segmentation, and feature extraction. Each component of the network makes active interactions with others to improve accuracy, and the end-to-end trainability of our model makes our results more reproducible. The proposed algorithm achieves competitive accuracy in the weakly supervised setting without any external components such as Fast R-CNN and Mask R-CNN on the standard benchmark dataset.

8.Image Embedded Segmentation: Combining Supervised and Unsupervised Objectives through Generative Adversarial Networks ⬇️

This paper presents a new regularization method to train a fully convolutional network for semantic tissue segmentation in histopathological images. This method relies on benefiting unsupervised learning, in the form of image reconstruction, for the network training. To this end, it puts forward an idea of defining a new embedding that allows uniting the main supervised task of semantic segmentation and an auxiliary unsupervised task of image reconstruction into a single task and proposes to learn this united task by a single generative model. This embedding generates a multi-channel output image by superimposing an original input image on its segmentation map. Then, the method learns to translate the input image to this embedded output image using a conditional generative adversarial network, which is known to be quite effective for image-to-image translations. This proposal is different than the existing approach that uses image reconstruction for the same regularization purpose. The existing approach considers segmentation and image reconstruction as two separate tasks in a multi-task network, defines their losses independently, and then combines these losses in a joint loss function. However, the definition of such a function requires externally determining the right contribution amounts of the supervised and unsupervised losses that yield balanced learning between the segmentation and image reconstruction tasks. The proposed approach eliminates this difficulty by uniting these two tasks into a single one, which intrinsically combines their losses. Using histopathological image segmentation as a showcase application, our experiments demonstrate that this proposed approach leads to better segmentation results.

9.A CNN With Multi-scale Convolution for Hyperspectral Image Classification using Target-Pixel-Orientation scheme ⬇️

Recently, CNN is a popular choice to handle the hyperspectral image classification challenges. In spite of having such large spectral information in Hyper-Spectral Image(s) (HSI), it creates a curse of dimensionality. Also, large spatial variability of spectral signature adds more difficulty in classification problem. Additionally, training a CNN in the end to end fashion with scarced training examples is another challenging and interesting problem. In this paper, a novel target-patch-orientation method is proposed to train a CNN based network. Also, we have introduced a hybrid of 3D-CNN and 2D-CNN based network architecture to implement band reduction and feature extraction methods, respectively. Experimental results show that our method outperforms the accuracies reported in the existing state of the art methods.

10.The Direction-Aware, Learnable, Additive Kernels and the Adversarial Network for Deep Floor Plan Recognition ⬇️

This paper presents a new approach for the recognition of elements in floor plan layouts. Besides of elements with common shapes, we aim to recognize elements with irregular shapes such as circular rooms and inclined walls. Furthermore, the reduction of noise in the semantic segmentation of the floor plan is on demand. To this end, we propose direction-aware, learnable, additive kernels in the application of both the context module and common convolutional blocks. We apply them for high performance of elements with both common and irregular shapes. Besides, an adversarial network with two discriminators is proposed to further improve the accuracy of the elements and to reduce the noise of the semantic segmentation. Experimental results demonstrate the superiority and effectiveness of the proposed network over the state-of-the-art methods.

11.Automatic marker-free registration of tree point-cloud data based on rotating projection ⬇️

Point-cloud data acquired using a terrestrial laser scanner (TLS) play an important role in digital forestry research. Multiple scans are generally used to overcome occlusion effects and obtain complete tree structural information. However, it is time-consuming and difficult to place artificial reflectors in a forest with complex terrain for marker-based registration, a process that reduces registration automation and efficiency. In this study, we propose an automatic coarse-to-fine method for the registration of point-cloud data from multiple scans of a single tree. In coarse registration, point clouds produced by each scan are projected onto a spherical surface to generate a series of two-dimensional (2D) images, which are used to estimate the initial positions of multiple scans. Corresponding feature-point pairs are then extracted from these series of 2D images. In fine registration, point-cloud data slicing and fitting methods are used to extract corresponding central stem and branch centers for use as tie points to calculate fine transformation parameters. To evaluate the accuracy of registration results, we propose a model of error evaluation via calculating the distances between center points from corresponding branches in adjacent scans. For accurate evaluation, we conducted experiments on two simulated trees and a real-world tree. Average registration errors of the proposed method were 0.26m around on simulated tree point clouds, and 0.05m around on real-world tree point cloud.

12.2018 Robotic Scene Segmentation Challenge ⬇️

In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs.

13.Multiple Object Tracking by Flowing and Fusing ⬇️

Most of Multiple Object Tracking (MOT) approaches compute individual target features for two subtasks: estimating target-wise motions and conducting pair-wise Re-Identification (Re-ID). Because of the indefinite number of targets among video frames, both subtasks are very difficult to scale up efficiently in end-to-end Deep Neural Networks (DNNs). In this paper, we design an end-to-end DNN tracking approach, Flow-Fuse-Tracker (FFT), that addresses the above issues with two efficient techniques: target flowing and target fusing. Specifically, in target flowing, a FlowTracker DNN module learns the indefinite number of target-wise motions jointly from pixel-level optical flows. In target fusing, a FuseTracker DNN module refines and fuses targets proposed by FlowTracker and frame-wise object detection, instead of trusting either of the two inaccurate sources of target proposal. Because FlowTracker can explore complex target-wise motion patterns and FuseTracker can refine and fuse targets from FlowTracker and detectors, our approach can achieve the state-of-the-art results on several MOT benchmarks. As an online MOT approach, FFT produced the top MOTA of 46.3 on the 2DMOT15, 56.5 on the MOT16, and 56.5 on the MOT17 tracking benchmarks, surpassing all the online and offline methods in existing publications.

14.Unsupervised Pixel-level Road Defect Detection via Adversarial Image-to-Frequency Transform ⬇️

In the past few years, the performance of road defect detection has been remarkably improved thanks to advancements on various studies on computer vision and deep learning. Although a large-scale and well-annotated datasets enhance the performance of detecting road pavement defects to some extent, it is still challengeable to derive a model which can perform reliably for various road conditions in practice, because it is intractable to construct a dataset considering diverse road conditions and defect patterns. To end this, we propose an unsupervised approach to detecting road defects, using Adversarial Image-to-Frequency Transform (AIFT). AIFT adopts the unsupervised manner and adversarial learning in deriving the defect detection model, so AIFT does not need annotations for road pavement defects. We evaluate the efficiency of AIFT using GAPs384 dataset, Cracktree200 dataset, CRACK500 dataset, and CFD dataset. The experimental results demonstrate that the proposed approach detects various road detects, and it outperforms existing state-of-the-art approaches.

15.Adversarial Incremental Learning ⬇️

Although deep learning performs really well in a wide variety of tasks, it still suffers from catastrophic forgetting -- the tendency of neural networks to forget previously learned information upon learning new tasks where previous data is not available. Earlier methods of incremental learning tackle this problem by either using a part of the old dataset, by generating exemplars or by using memory networks. Although, these methods have shown good results but using exemplars or generating them, increases memory and computation requirements. To solve these problems we propose an adversarial discriminator based method that does not make use of old data at all while training on new tasks. We particularly tackle the class incremental learning problem in image classification, where data is provided in a class-based sequential manner. For this problem, the network is trained using an adversarial loss along with the traditional cross-entropy loss. The cross-entropy loss helps the network progressively learn new classes while the adversarial loss helps in preserving information about the existing classes. Using this approach, we are able to outperform other state-of-the-art methods on CIFAR-100, SVHN, and MNIST datasets.

16.Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences ⬇️

Understanding the structure of complex activities in videos is one of the many challenges faced by action recognition methods. To overcome this challenge, not only do methods need a solid knowledge of the visual structure of underlying features but also a good interpretation of how they could change over time. Consequently, action segmentation tasks must take into account not only the visual cues from individual frames, but their characteristics as a temporal sequence of features.
This work presents our findings on the impact of incorporating both visual and temporal learning on an unsupervised action segmentation pipeline. We introduce a novel approach to extract relevant visual and temporal features from untrimmed sequences for the temporal localization of sub-activities within complex actions without any labeling information. Through extensive experimentation on two benchmark datasets -- Breakfast Actions, and YouTube Instructions -- we show that the proposed approach is able to provide a meaningful visual and temporal embedding from the visual cues from contiguous video frames and that it indeed helps in temporal segmentation.

17.Gun Source and Muzzle Head Detection ⬇️

There is a surging need across the world for protection against gun violence. There are three main areas that we have identified as challenging in research that tries to curb gun violence: temporal location of gunshots, gun type prediction and gun source (shooter) detection. Our task is gun source detection and muzzle head detection, where the muzzle head is the round opening of the firing end of the gun. We would like to locate the muzzle head of the gun in the video visually, and identify who has fired the shot. In our formulation, we turn the problem of muzzle head detection into two sub-problems of human object detection and gun smoke detection. Our assumption is that the muzzle head typically lies between the gun smoke caused by the shot and the shooter. We have interesting results both in bounding the shooter as well as detecting the gun smoke. In our experiments, we are successful in detecting the muzzle head by detecting the gun smoke and the shooter.

18.FIS-Nets: Full-image Supervised Networks for Monocular Depth Estimation ⬇️

This paper addresses the importance of full-image supervision for monocular depth estimation. We propose a semi-supervised architecture, which combines both unsupervised framework of using image consistency and supervised framework of dense depth completion. The latter provides full-image depth as supervision for the former. Ego-motion from navigation system is also embedded into the unsupervised framework as output supervision of an inner temporal transform network, making monocular depth estimation better. In the evaluation, we show that our proposed model outperforms other approaches on depth estimation.

19.The benefits of synthetic data for action categorization ⬇️

In this paper, we study the value of using synthetically produced videos as training data for neural networks used for action categorization. Motivated by the fact that texture and background of a video play little to no significant roles in optical flow, we generated simplified texture-less and background-less videos and utilized the synthetic data to train a Temporal Segment Network (TSN). The results demonstrated that augmenting TSN with simplified synthetic data improved the original network accuracy (68.5%), achieving 71.8% on HMDB-51 when adding 4,000 videos and 72.4% when adding 8,000 videos. Also, training using simplified synthetic videos alone on 25 classes of UCF-101 achieved 30.71% when trained on 2500 videos and 52.7% when trained on 5000 videos. Finally, results showed that when reducing the number of real videos of UCF-25 to 10% and combining them with synthetic videos, the accuracy drops to only 85.41%, compared to a drop to 77.4% when no synthetic data is added.

20.3D Aggregated Faster R-CNN for General Lesion Detection ⬇️

Lesions are damages and abnormalities in tissues of the human body. Many of them can later turn into fatal diseases such as cancers. Detecting lesions are of great importance for early diagnosis and timely treatment. To this end, Computed Tomography (CT) scans often serve as the screening tool, allowing us to leverage the modern object detection techniques to detect the lesions. However, lesions in CT scans are often small and sparse. The local area of lesions can be very confusing, leading the region based classifier branch of Faster R-CNN easily fail. Therefore, most of the existing state-of-the-art solutions train two types of heterogeneous networks (multi-phase) separately for the candidate generation and the False Positive Reduction (FPR) purposes. In this paper, we enforce an end-to-end 3D Aggregated Faster R-CNN solution by stacking an "aggregated classifier branch" on the backbone of RPN. This classifier branch is equipped with Feature Aggregation and Local Magnification Layers to enhance the classifier branch. We demonstrate our model can achieve the state of the art performance on both LUNA16 and DeepLesion dataset. Especially, we achieve the best single-model FROC performance on LUNA16 with the inference time being 4.2s per processed scan.

21.Semantic Adversarial Perturbations using Learnt Representations ⬇️

Adversarial examples for image classifiers are typically created by searching for a suitable norm-constrained perturbation to the pixels of an image. However, such perturbations represent only a small and rather contrived subset of possible adversarial inputs; robustness to norm-constrained pixel perturbations alone is insufficient. We introduce a novel method for the construction of a rich new class of semantic adversarial examples. Leveraging the hierarchical feature representations learnt by generative models, our procedure makes adversarial but realistic changes at different levels of semantic granularity. Unlike prior work, this is not an ad-hoc algorithm targeting a fixed category of semantic property. For instance, our approach perturbs the pose, location, size, shape, colour and texture of the objects in an image without manual encoding of these concepts. We demonstrate this new attack by creating semantic adversarial examples that fool state-of-the-art classifiers on the MNIST and ImageNet datasets.

22.Semi-Automatic Generation of Tight Binary Masks and Non-Convex Isosurfaces for Quantitative Analysis of 3D Biological Samples ⬇️

Current in vivo microscopy allows us detailed spatiotemporal imaging (3D+t) of complete organisms and offers insights into their development on the cellular level. Even though the imaging speed and quality is steadily improving, fully-automated segmentation and analysis methods are often not accurate enough. This is particularly true while imaging large samples (100um - 1mm) and deep inside the specimen. Drosophila embryogenesis, widely used as a developmental paradigm, presents an example for such a challenge, especially where cell outlines need to imaged - a general challenge in other systems as well. To deal with the current bottleneck in analyzing quantitatively the 3D+t light-sheet microscopy images of Drosophila embryos, we developed a collection of semi-automatic open-source tools. The presented methods include a semi-automatic masking procedure, automatic projection of non-convex 3D isosurfaces to 2D representations as well as cell segmentation and tracking.

23.Black-Box Saliency Map Generation Using Bayesian Optimisation ⬇️

Saliency maps are often used in computer vision to provide intuitive interpretations of what input regions a model has used to produce a specific prediction. A number of approaches to saliency map generation are available, but most require access to model parameters. This work proposes an approach for saliency map generation for black-box models, where no access to model parameters is available, using a Bayesian optimisation sampling method. The approach aims to find the global salient image region responsible for a particular (black-box) model's prediction. This is achieved by a sampling-based approach to model perturbations that seeks to localise salient regions of an image to the black-box model. Results show that the proposed approach to saliency map generation outperforms grid-based perturbation approaches, and performs similarly to gradient-based approaches which require access to model parameters.

24.An Implicit Attention Mechanism for Deep Learning Pedestrian Re-identification Frameworks ⬇️

Attention is defined as the preparedness for the mental selection of certain aspects in a physical environment. In the computer vision domain, this mechanism is of most interest, as it helps to define the segments of an image/video that are critical for obtaining a specific decision. This paper introduces one 'implicit' attentional mechanism for deep learning frameworks, that provides simultaneously: 1) masks-free; and 2) foreground-focused samples for the inference phase. The main idea is to generate synthetic data composed of interleaved segments from the original learning set, while using class information only from specific segments. During the learning phase, the newly generated samples feed the network, keeping their label exclusively consistent with the identity from where the region-of-interest was cropped. Hence, as the model receives images of each identity with inconsistent unwanted areas, it naturally pays the most attention to the label consistent consistent regions, which we observed to be equivalent to learn an effective receptive field. During the test phase, samples are provided without any mask, and the network naturally disregards the detrimental information, which is the insight for the observed improvements in performance. As a proof-of-concept, we consider the challenging problem of pedestrian re-identification and compare the effectiveness of our solution to the state-of-the-art techniques in the well known Richly Annotated Pedestrian (RAP) dataset. The code is available at this https URL.

25.Adversarial Attacks on Convolutional Neural Networks in Facial Recognition Domain ⬇️

Numerous recent studies have demonstrated how Deep Neural Network (DNN) classifiers can be fooled by adversarial examples, in which an attacker adds perturbations to an original sample, causing the classifier to misclassify the sample. Adversarial attacks that render DNNs vulnerable in real life represent a serious threat, given the consequences of improperly functioning autonomous vehicles, malware filters, or biometric authentication systems. In this paper, we apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier that we trained ourselves, to analyze transferability of this method. Next, we craft a variety of different attack algorithms on a facial image dataset, with the intention of developing untargeted black-box approaches assuming minimal adversarial knowledge, to further assess the robustness of DNNs in the facial recognition realm. We explore modifying single optimal pixels by a large amount, or modifying all pixels by a smaller amount, or combining these two attack approaches. While our single-pixel attacks achieved about a 15% average decrease in classifier confidence level for the actual class, the all-pixel attacks were more successful and achieved up to an 84% average decrease in confidence, along with an 81.6% misclassification rate, in the case of the attack that we tested with the highest levels of perturbation. Even with these high levels of perturbation, the face images remained fairly clearly identifiable to a human. We hope our research may help to advance the study of adversarial attacks on DNNs and defensive mechanisms to counteract them, particularly in the facial recognition domain.

26.Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding ⬇️

Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood ("document") and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis.

27.stream-learn -- open-source Python library for difficult data stream batch analysis ⬇️

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

28.Just Noticeable Difference for Machines to Generate Adversarial Images ⬇️

One way of designing a robust machine learning algorithm is to generate authentic adversarial images which can trick the algorithms as much as possible. In this study, we propose a new method to generate adversarial images which are very similar to true images, yet, these images are discriminated from the original ones and are assigned into another category by the model. The proposed method is based on a popular concept of experimental psychology, called, Just Noticeable Difference. We define Just Noticeable Difference for a machine learning model and generate a least perceptible difference for adversarial images which can trick a model. The suggested model iteratively distorts a true image by gradient descent method until the machine learning algorithm outputs a false label. Deep Neural Networks are trained for object detection and classification tasks. The cost function includes regularization terms to generate just noticeably different adversarial images which can be detected by the model. The adversarial images generated in this study looks more natural compared to the output of state of the art adversarial image generators.

29.The Tensor Brain: Semantic Decoding for Perception and Memory ⬇️

We analyse perception and memory using mathematical models for knowledge graphs and tensors to gain insights in the corresponding functionalities of the human mind. Our discussion is based on the concept of propositional sentences consisting of \textit{subject-predicate-object} (SPO) triples for expressing elementary facts. SPO sentences are the basis for most natural languages but might also be important for explicit perception and declarative memories, as well as intra-brain communication and the ability to argue and reason. A set of SPO sentences can be described as a knowledge graph, which can be transformed into an adjacency tensor. We introduce tensor models, where concepts have dual representations as indices and associated embeddings, two constructs we believe are essential for the understanding of implicit and explicit perception and memory in the brain. We argue that a biological realization of perception and memory imposes constraints on information processing. In particular, we propose that explicit perception and declarative memories require a semantic decoder, which, in a simple realization, is based on four layers: First, a sensory memory layer, as a buffer for sensory input, second, an index layer representing concepts, third, a memoryless representation layer for the broadcasting of information and fourth, a working memory layer as a processing center and data buffer. In a Bayesian brain interpretation, semantic memory defines the prior for triple statements. We propose that, in evolution and during development, semantic memory, episodic memory and natural language evolved as emergent properties in the agents' process to gain deeper understanding of sensory information. We present a concrete model realization and validate some aspects of our proposed model on benchmark data where we demonstrate state-of-the-art performance.