[논문] Deep Learning, Yann Lecun, 2015

This research is about how the deep learning mechanisms actually work.

Introduction

Deep Learning methods are basically representation learning methods which discover the representation needed for detection or classification.
Deep Learning network is composed of multiple non-linear modules.
Each modules represents the given image at its own level.
The higher the level, the more abstract it represents.
For example, given the image comes as an input, the first layer simply represents the absence or presence of the edge.
Second layer represents the certain motifs by rearranging the edges found above, regardless of small variations.
Third layer may represents the certain objects by combining the motifs found above, regardless of small variations.

Supervised Learning

Multi-Layer neural network, shown as below, can distort the input space to make the classes linearly separable.
Regular Grid on the left side can be transformed by hidden units shown in the middle.
Each input feature, and the weight values, are all used to compute the output vector. It is called forward pass.
Each weight is re-calculated from the back-propagation, by computing the error derivative with respect to the output of each unit.
The outputs of each unit is a weighted sum of the error derivatives with respect to the total inputs to the units in the layer above.

Below is the example of deep neural network.
Each rectangular image represents a feature map.
In order to be invariant with the orientation, position, or illumination of the samoyed dog, good feature extractor is required.
Feature extractor should solve the selectivity-invariance dilemma: differentiate between the aspects that are important for descrimination and that are invariant to irrelevant aspects.

Rather than designing good feature extractors at firsthand, we can use deep learning to learn good features automatically.
Given that a deep learning architecture is consisted with a multilayer stack of the modules, each module transforms its input to increase the selectivity and the invariance of the representation.
With multiple non-linear layers, a system can implement the functions that are sensitive to minute details and insensitive to large irrelevant variations.

Backpropagation to train multilayer architectures

Derivative of the objective with respect to the input can be computed by working backwards from the gradient with respect to the output of that module.
Backpropagation propagates gradients through all modules, starting from the output at the top all the way to the bottom.

Convolutional Neural Network

The convolutional nerual network, ConvNet, is one of the type of deep network that achieved practical success when multiple arrays of the input comes.
For example, 1D for signals and sequence(language), 2D for images or audio, and 3D for video or volumetric images.
The architecture of ConvNet is structured as a series of stages.
The first few stages are composed of convolutional layers and pooling layers.
Convolutional layers detect local conjunctions of features from the previous layer.
- Form the distinctive local motifs, which is applicable for array data such as images where their values are often highly correlated.
- Applicable to the image where the local statistics of which are invariant to location.
- Units at different locations sharing the same weights and detecting the same pattern in different parts of the array.
Pooling layers merge semantically similar features into one.

A Holistic View of Perception in Intelligent Vehicles (0)	2023.07.30
[논문] Self-Distilled Self-supervised Representation Learning (0)	2023.03.14
Preprocessing Layer 만들기 (0)	2023.01.07
[논문] A survey on deep geometry learning: From a representation perspective (1)	2022.10.08
[개념정리] 딥러닝 성능 용어정리 (0)	2022.05.07

CVLab: Computer Vision Repository