SlideShare a Scribd company logo
Deep CV 101
Burness.Duan
UCloud
1.Deep CV Classical Models
2.Deep CV Applications
3.Distributed Deep Learning In UCloud
Outline
Deep CV Classical Models
1.LeNet
2.AlexNet
3.GoogLeNet
4.VGG
5.Deep Residual Network
LeNet
LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document
recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
LeNet
https://github.com/spark-mler/WorkWithTensorflow/blob/master/CV_model/lenet/lenet.py·
AlexNet
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural
networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
AlexNet Tricks
ReLU on CIFAR-10
Local Response Normalization
Dropout
AlexNet
GoogLeNet
Shortcoming
1. Bigger size means a larger number of parameters, which makes
the enlarged network more prone to over-fitting
2. Increased network size is the dramatically increased use of
computational resources
3. Computing infrastructures are very inefficient when it comes to
numerical calculation on non-uniform sparse data structures.
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1-9.
GoogLeNet
Inception Module
GoogLeNet
Inception Module With Dimension Reduction
Lin M, Chen Q, Yan S. Network in network[J]. arXiv preprint arXiv:1312.4400, 2013.
GoogLeNet
Overall of GoogLeNet
GoogLeNet
GoogLeNet With TFlearn
https://github.com/tflearn/tflearn/blob/master/examples/images/googlenet.py
VGG
VGG With TFlearn
network = input_data(shape=[None, 224, 224, 3])
network = conv_2d(network, 64, 3, activation='relu')
network = conv_2d(network, 64, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 128, 3, activation='relu')
network = conv_2d(network, 128, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 256, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = fully_connected(network, 4096, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 17, activation='softmax')
network = regression(network, optimizer=‘rmsprop’,
loss=‘categorical_crossentropy’ ,learning_rate=0.001)
Deep Residual Network
with the network depth increasing, accuracy gets saturated
(which might be unsurprising) and then degrades rapidly and
such degradation is not caused by overfitting
Deep Residual Network
F(x):=H(x)-x
def residual_block(incoming, nb_blocks, out_channels, downsample=False,
downsample_strides=2, activation=‘relu’, batch_norm=True,
bias=True, weights_init=‘variance_scaling’,
bias_init=‘zeros’, regularizer=‘L2’, weight_decay=0.0001,
trainable=True, restore=True, reuse=False, scope=None,
name=“ResidualBlock”):
resnet = incoming
in_channels = incoming.get_shape().as_list()[-1]
with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope:
name = scope.name
for i in range(nb_blocks):
identity = resnet
if not downsample:
downsample_strides = 1
if batch_norm:
resnet = tflearn.batch_normalization(resnet)
resnet = tflearn.activation(resnet, activation)
resnet = conv_2d(resnet, out_channels, 3,
downsample_strides, ‘same’, ‘linear’, bias, weights_init,
bias_init, regularizer, weight_decay, trainable, restore)
if batch_norm:
resnet = tflearn.batch_normalization(resnet)
resnet = tflearn.activation(resnet, activation)
resnet = conv_2d(resnet, out_channels, 3, 1, ‘same’,
‘linear’, bias, weights_init, bias_init, regularizer, weight_decay,
trainable, restore)
# Downsampling
if downsample_strides > 1:
identity = tflearn.avg_pool_2d(identity, 1, downsample_strides)
# Projection to new dimension
if in_channels != out_channels:
ch = (out_channels - in_channels)//2
identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]])
in_channels = out_channels
resnet = resnet + identity
return resnet
Deep Residual Network
Tools For TensorFlow
Keras
Next
Applications
TFlearn
TF-slim
Learn
TensorLayer
Deep CV Applications
1.Image Classification
2.Neural Style
3.Txt2Img, img2txt
Image Classification
1.Training from scratch
2.Retrain from pre-trained model
3.Load model and frozen some layers’
weights and retrain the other layers
Training from scratch
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
for _ in range(1000):
batch_xs, batch_ys =mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_:
mnist.test.labels}))
TRAIN
DATA
MODEL
INIT
PARAMS
TEST
DATA
TEST
LABELS
Retrain from pre-trained model
TRAIN
DATA
MODEL
PRE
PARAMS
TEST
DATA
TEST
LABELS
1. Load pre params from a model file(pb)
2. Change your model to the new task(class num)
3. Fit the train data to update the all weights
4. Fit your test data to inference
Retrain from pre-trained model (frozen)
TRAIN
DATA
MODEL
PRE
PARAMS
TEST
DATA
TEST
LABELS
1. Load pre params from a model file(pb)
2. Frozen your layers’ weight
3. Change the network’s class num
4. Fit your train data to update the unfrozen layers’ weight
5. Fit your test data to inference
FROZEN
https://github.com/spark-mler/WorkWithTensorflow/tree/master/cv_bot/models/pretrain_inference
Neural Style
1.MRF-Based
2.CNN-Based
3.MRF and CNN-Based
4.Fast Neural Style
MRF-Based
Freeman W T, Liu C. Markov random fields for super-resolution and texture synthesis[J].
Advances in Markov Random Fields for Vision and Image Processing, 2011, 1: 155-165.
Efros A A, Leung T K. Texture synthesis by non-parametric sampling[C]//Computer Vision, 1999.
The Proceedings of the Seventh IEEE International Conference on. IEEE, 1999, 2: 1033-1038.
CNN-Based
Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint
arXiv:1508.06576, 2015.
https://github.com/anishathalye/neural-style
https://github.com/jcjohnson/neural-style
MRF And CNN-Based
Li C, Wand M. Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis[J]. arXiv preprint arXiv:1601.04589, 2016
https://github.com/chuanli11/CNNMRF
MRF And CNN-Based
Fast Neural Style
Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-
resolution[J]. arXiv preprint arXiv:1603.08155, 2016.
https://github.com/burness/neural_style_tensorflow/tree/master/fast_neural_style
Fast Neural Style
TextImage
1.Text-to-Image
2.Image-to-Text
Text-to-Image
Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[J].
arXiv preprint arXiv:1605.05396, 2016.
https://github.com/paarthneekhara/text-to-image
Generative Adversarial Network
Generator G and a discriminator D compete in a two-player minimax game
G: fool the discriminator
D: Distinguish real training data from synthetic images
Text-to-Image
Generative Adversarial Network
G: ℝ 𝑍
× ℝ 𝑇
→ ℝ 𝐷
D: ℝ 𝑇 × ℝ 𝐷 → 0,1
Text-to-Image
Results
the flower has yellow petals and the center of it
is brown
the flower shown has yellow anther red
pistil and bright red petals
this flower has petals that are yellow,
white and purple and has dark lines
the petals on this flower are white with a
yellow center
this flower has a lot of small round pink
petals
this flower is orange in color, and has
petals that are ruffled and rounded
Image-to-Text
Vinyals O, Toshev A, Bengio S, et al. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
Challenge[J]. 2016.
https://github.com/tensorflow/models/tree/master/im2txt
https://github.com/tensorflow/models/issues/480
https://github.com/tensorflow/models/pull/485/commits/c6a4f783080c5310ce0e3244daa31af57df12def
Image-to-Text
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing
internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
Distributed Deep Learning In UCloud
UCloud Multi Node Weight Update
Distributed Deep Learning In UCloud
Distributed Deep Learning In UCloud
Distributed Deep Learning In UCloud
Deep cv 101

More Related Content

PDF
[251] implementing deep learning using cu dnn
NAVER D2
 
PDF
NeuralProcessingofGeneralPurposeApproximatePrograms
Mohid Nabil
 
PDF
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
 
PDF
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Hao-Wen (Herman) Dong
 
PDF
5 Introduction to neural networks
Dmytro Fishman
 
PDF
What is a Neural Network | Edureka
Edureka!
 
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
PPTX
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
[251] implementing deep learning using cu dnn
NAVER D2
 
NeuralProcessingofGeneralPurposeApproximatePrograms
Mohid Nabil
 
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
 
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Hao-Wen (Herman) Dong
 
5 Introduction to neural networks
Dmytro Fishman
 
What is a Neural Network | Edureka
Edureka!
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 

What's hot (17)

PPTX
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Altoros
 
PDF
Deep Learning Survey
Anthony Parziale
 
PPTX
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
PDF
CUDA and Caffe for deep learning
Amgad Muhammad
 
PPTX
An introduction to Deep Learning with Apache MXNet (November 2017)
Julien SIMON
 
PDF
Hands-on Tutorial of Deep Learning
Chun-Ming Chang
 
PPTX
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
PDF
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
Taegyun Jeon
 
PPTX
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn
 
PPTX
Deep Learning for AI (2)
Dongheon Lee
 
PPTX
Deep learning (2)
Muhanad Al-khalisy
 
PDF
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Alex Conway
 
PPTX
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
PDF
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
Hiroki Nakahara
 
PPTX
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
PDF
A Study On Deep Learning
Abdelrahman Hosny
 
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Altoros
 
Deep Learning Survey
Anthony Parziale
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
CUDA and Caffe for deep learning
Amgad Muhammad
 
An introduction to Deep Learning with Apache MXNet (November 2017)
Julien SIMON
 
Hands-on Tutorial of Deep Learning
Chun-Ming Chang
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
Taegyun Jeon
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn
 
Deep Learning for AI (2)
Dongheon Lee
 
Deep learning (2)
Muhanad Al-khalisy
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Alex Conway
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
Hiroki Nakahara
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
A Study On Deep Learning
Abdelrahman Hosny
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Ad

Viewers also liked (8)

PPTX
苏宁图像智能分析实践
Xiaohu ZHU
 
PDF
Shanghai deep learning meetup 4
Xiaohu ZHU
 
PDF
CBIR in the Era of Deep Learning
Xiaohu ZHU
 
PDF
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
Xiaohu ZHU
 
PDF
Deep Reinforcement Learning An Introduction
Xiaohu ZHU
 
PDF
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
PDF
How to Make Awesome SlideShares: Tips & Tricks
SlideShare
 
PDF
Getting Started With SlideShare
SlideShare
 
苏宁图像智能分析实践
Xiaohu ZHU
 
Shanghai deep learning meetup 4
Xiaohu ZHU
 
CBIR in the Era of Deep Learning
Xiaohu ZHU
 
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
Xiaohu ZHU
 
Deep Reinforcement Learning An Introduction
Xiaohu ZHU
 
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
How to Make Awesome SlideShares: Tips & Tricks
SlideShare
 
Getting Started With SlideShare
SlideShare
 
Ad

Similar to Deep cv 101 (20)

PPTX
Computer Vision for Beginners
Sanghamitra Deb
 
PDF
Eye deep
sveitser
 
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
PPTX
Image Classification using deep learning
Asma-AH
 
PDF
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
PPTX
Nuts and Bolts of Transfer Learning.pptx
vmanjusundertamil21
 
PDF
3_Transfer_Learning.pdf
FEG
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PDF
ImageNet Classification with Deep Convolutional Neural Networks
Willy Marroquin (WillyDevNET)
 
PDF
Deep learning with C++ - an introduction to tiny-dnn
Taiga Nomi
 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PPTX
Neural Networks with Google TensorFlow
Darshan Patel
 
PPTX
Digit recognizer by convolutional neural network
Ding Li
 
PDF
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
PDF
Deep LearningフレームワークChainerと最近の技術動向
Shunta Saito
 
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
PDF
ML in Android
Jose Antonio Corbacho
 
PPTX
CNN_INTRO.pptx
NiharikaThakur32
 
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
Computer Vision for Beginners
Sanghamitra Deb
 
Eye deep
sveitser
 
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Image Classification using deep learning
Asma-AH
 
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
Nuts and Bolts of Transfer Learning.pptx
vmanjusundertamil21
 
3_Transfer_Learning.pdf
FEG
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
imageclassification-160206090009.pdf
KammetaJoshna
 
ImageNet Classification with Deep Convolutional Neural Networks
Willy Marroquin (WillyDevNET)
 
Deep learning with C++ - an introduction to tiny-dnn
Taiga Nomi
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
Neural Networks with Google TensorFlow
Darshan Patel
 
Digit recognizer by convolutional neural network
Ding Li
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
Deep LearningフレームワークChainerと最近の技術動向
Shunta Saito
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
ML in Android
Jose Antonio Corbacho
 
CNN_INTRO.pptx
NiharikaThakur32
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 

Recently uploaded (20)

PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Software Development Company | KodekX
KodekX
 
PPT
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Best ERP System for Manufacturing in India | Elite Mindz
Elite Mindz
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Software Development Company | KodekX
KodekX
 
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Best ERP System for Manufacturing in India | Elite Mindz
Elite Mindz
 
This slide provides an overview Technology
mineshkharadi333
 
DevOps & Developer Experience Summer BBQ
AUGNYC
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
Software Development Methodologies in 2025
KodekX
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 

Deep cv 101

  • 2. 1.Deep CV Classical Models 2.Deep CV Applications 3.Distributed Deep Learning In UCloud Outline
  • 3. Deep CV Classical Models 1.LeNet 2.AlexNet 3.GoogLeNet 4.VGG 5.Deep Residual Network
  • 4. LeNet LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
  • 6. AlexNet Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
  • 7. AlexNet Tricks ReLU on CIFAR-10 Local Response Normalization Dropout
  • 9. GoogLeNet Shortcoming 1. Bigger size means a larger number of parameters, which makes the enlarged network more prone to over-fitting 2. Increased network size is the dramatically increased use of computational resources 3. Computing infrastructures are very inefficient when it comes to numerical calculation on non-uniform sparse data structures. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1-9.
  • 11. GoogLeNet Inception Module With Dimension Reduction Lin M, Chen Q, Yan S. Network in network[J]. arXiv preprint arXiv:1312.4400, 2013.
  • 14. VGG VGG With TFlearn network = input_data(shape=[None, 224, 224, 3]) network = conv_2d(network, 64, 3, activation='relu') network = conv_2d(network, 64, 3, activation='relu') network = max_pool_2d(network, 2, strides=2) network = conv_2d(network, 128, 3, activation='relu') network = conv_2d(network, 128, 3, activation='relu') network = max_pool_2d(network, 2, strides=2) network = conv_2d(network, 256, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = conv_2d(network, 256, 3, activation='relu') network = max_pool_2d(network, 2, strides=2) network = conv_2d(network, 512, 3, activation='relu') network = conv_2d(network, 512, 3, activation='relu') network = conv_2d(network, 512, 3, activation='relu') network = max_pool_2d(network, 2, strides=2) network = conv_2d(network, 512, 3, activation='relu') network = conv_2d(network, 512, 3, activation='relu') network = conv_2d(network, 512, 3, activation='relu') network = max_pool_2d(network, 2, strides=2) network = fully_connected(network, 4096, activation='relu') network = dropout(network, 0.5) network = fully_connected(network, 4096, activation='relu') network = dropout(network, 0.5) network = fully_connected(network, 17, activation='softmax') network = regression(network, optimizer=‘rmsprop’, loss=‘categorical_crossentropy’ ,learning_rate=0.001)
  • 15. Deep Residual Network with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly and such degradation is not caused by overfitting
  • 16. Deep Residual Network F(x):=H(x)-x def residual_block(incoming, nb_blocks, out_channels, downsample=False, downsample_strides=2, activation=‘relu’, batch_norm=True, bias=True, weights_init=‘variance_scaling’, bias_init=‘zeros’, regularizer=‘L2’, weight_decay=0.0001, trainable=True, restore=True, reuse=False, scope=None, name=“ResidualBlock”): resnet = incoming in_channels = incoming.get_shape().as_list()[-1] with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope: name = scope.name for i in range(nb_blocks): identity = resnet if not downsample: downsample_strides = 1 if batch_norm: resnet = tflearn.batch_normalization(resnet) resnet = tflearn.activation(resnet, activation) resnet = conv_2d(resnet, out_channels, 3, downsample_strides, ‘same’, ‘linear’, bias, weights_init, bias_init, regularizer, weight_decay, trainable, restore) if batch_norm: resnet = tflearn.batch_normalization(resnet) resnet = tflearn.activation(resnet, activation) resnet = conv_2d(resnet, out_channels, 3, 1, ‘same’, ‘linear’, bias, weights_init, bias_init, regularizer, weight_decay, trainable, restore) # Downsampling if downsample_strides > 1: identity = tflearn.avg_pool_2d(identity, 1, downsample_strides) # Projection to new dimension if in_channels != out_channels: ch = (out_channels - in_channels)//2 identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]]) in_channels = out_channels resnet = resnet + identity return resnet
  • 19. Deep CV Applications 1.Image Classification 2.Neural Style 3.Txt2Img, img2txt
  • 20. Image Classification 1.Training from scratch 2.Retrain from pre-trained model 3.Load model and frozen some layers’ weights and retrain the other layers
  • 21. Training from scratch mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True) x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.matmul(x, W) + b y_ = tf.placeholder(tf.float32, [None, 10]) cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_)) train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) sess = tf.InteractiveSession() tf.initialize_all_variables().run() for _ in range(1000): batch_xs, batch_ys =mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) TRAIN DATA MODEL INIT PARAMS TEST DATA TEST LABELS
  • 22. Retrain from pre-trained model TRAIN DATA MODEL PRE PARAMS TEST DATA TEST LABELS 1. Load pre params from a model file(pb) 2. Change your model to the new task(class num) 3. Fit the train data to update the all weights 4. Fit your test data to inference
  • 23. Retrain from pre-trained model (frozen) TRAIN DATA MODEL PRE PARAMS TEST DATA TEST LABELS 1. Load pre params from a model file(pb) 2. Frozen your layers’ weight 3. Change the network’s class num 4. Fit your train data to update the unfrozen layers’ weight 5. Fit your test data to inference FROZEN https://github.com/spark-mler/WorkWithTensorflow/tree/master/cv_bot/models/pretrain_inference
  • 24. Neural Style 1.MRF-Based 2.CNN-Based 3.MRF and CNN-Based 4.Fast Neural Style
  • 25. MRF-Based Freeman W T, Liu C. Markov random fields for super-resolution and texture synthesis[J]. Advances in Markov Random Fields for Vision and Image Processing, 2011, 1: 155-165. Efros A A, Leung T K. Texture synthesis by non-parametric sampling[C]//Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. IEEE, 1999, 2: 1033-1038.
  • 26. CNN-Based Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015. https://github.com/anishathalye/neural-style https://github.com/jcjohnson/neural-style
  • 27. MRF And CNN-Based Li C, Wand M. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis[J]. arXiv preprint arXiv:1601.04589, 2016 https://github.com/chuanli11/CNNMRF
  • 29. Fast Neural Style Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super- resolution[J]. arXiv preprint arXiv:1603.08155, 2016. https://github.com/burness/neural_style_tensorflow/tree/master/fast_neural_style
  • 32. Text-to-Image Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[J]. arXiv preprint arXiv:1605.05396, 2016. https://github.com/paarthneekhara/text-to-image Generative Adversarial Network Generator G and a discriminator D compete in a two-player minimax game G: fool the discriminator D: Distinguish real training data from synthetic images
  • 33. Text-to-Image Generative Adversarial Network G: ℝ 𝑍 × ℝ 𝑇 → ℝ 𝐷 D: ℝ 𝑇 × ℝ 𝐷 → 0,1
  • 34. Text-to-Image Results the flower has yellow petals and the center of it is brown the flower shown has yellow anther red pistil and bright red petals this flower has petals that are yellow, white and purple and has dark lines the petals on this flower are white with a yellow center this flower has a lot of small round pink petals this flower is orange in color, and has petals that are ruffled and rounded
  • 35. Image-to-Text Vinyals O, Toshev A, Bengio S, et al. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge[J]. 2016. https://github.com/tensorflow/models/tree/master/im2txt https://github.com/tensorflow/models/issues/480 https://github.com/tensorflow/models/pull/485/commits/c6a4f783080c5310ce0e3244daa31af57df12def
  • 36. Image-to-Text Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
  • 37. Distributed Deep Learning In UCloud UCloud Multi Node Weight Update

Editor's Notes

  • #4: 1.LeNet 2.AlexNet 3.GoogLeNet 4.VGG 5.Deep Residual Network
  • #5: Input:32*32 C1:5*5*6(5*5的卷积核大小,6个channel),28*28*6 S2:下采样,max-pool,2*2,14*14*6 C3:5*5*16(5*5的卷积核大小,16个channel),10*10*16 S4:下采样,max-pool,2*2,5*5*16 C5:5*5*120(5*5的卷积核大小,120个channel),1*1*120 F6:120*84 Output:84*10
  • #6: Input:32*32 C1:5*5*6(5*5的卷积核大小,6个channel),28*28*6 S2:下采样,max-pool,2*2,14*14*6 C3:5*5*16(5*5的卷积核大小,16个channel),10*10*16 S4:下采样,max-pool,2*2,5*5*16 C5:5*5*120(5*5的卷积核大小,120个channel),1*1*120 F6:120*84 Output:84*10
  • #7: 第一层卷积层 输入图像为227*227*3(paper上貌似有点问题224*224*3)的图像,使用了96个kernels(96,11,11,3),以4个pixel为一个单位来右移或者下移,能够产生55*55个卷积后的矩形框值,然后进行response-normalized(其实是Local Response Normalized,后面我会讲下这里)和pooled之后,pool这一层好像caffe里面的alexnet和paper里面不太一样,alexnet里面采样了两个GPU,所以从图上面看第一层卷积层厚度有两部分,池化pool_size=(3,3),滑动步长为2个pixels,得到96个27*27个feature。 第二层卷积层使用256个(同样,分布在两个GPU上,每个128kernels(5*5*48)),做pad_size(2,2)的处理,以1个pixel为单位移动(感谢网友指出),能够产生27*27个卷积后的矩阵框,做LRN处理,然后pooled,池化以3*3矩形框,2个pixel为步长,得到256个13*13个features。 第三层、第四层都没有LRN和pool,第五层只有pool,其中第三层使用384个kernels(3*3*256,pad_size=(1,1),得到256*15*15,kernel_size为(3,3),以1个pixel为步长,得到384*13*13);第四层使用384个kernels(pad_size(1,1)得到256*15*15,核大小为(3,3)步长为1个pixel,得到384*13*13);第五层使用256个kernels(pad_size(1,1)得到384*15*15,kernel_size(3,3),得到256*13*13,pool_size(3,3)步长2个pixels,得到256*6*6)。 全连接层: 前两层分别有4096个神经元,最后输出softmax为1000个(ImageNet),注意caffe图中全连接层中有relu、dropout、innerProduct。
  • #8: 基于CIFAR-10的四层卷积网络在tanh和ReLU达到25%的training error的迭代次数 使用ReLU f(x)=max(0,x)后,你会发现激活函数之后的值没有了tanh、sigmoid函数那样有一个值域区间,所以一般在ReLU之后会做一个normalization,LRU就由文中提出(这里不确定,应该是提出?)一种方法,在神经科学中有个概念叫“Lateral inhibition”,讲的是活跃的神经元对它周边神经元的影响 Response normalization reduces our top-1 and top-5 error rates by 1.4% and 1.2%
  • #9: 基于CIFAR-10的四层卷积网络在tanh和ReLU达到25%的training error的迭代次数 使用ReLU f(x)=max(0,x)后,你会发现激活函数之后的值没有了tanh、sigmoid函数那样有一个值域区间,所以一般在ReLU之后会做一个normalization,LRU就由文中提出(这里不确定,应该是提出?)一种方法,在神经科学中有个概念叫“Lateral inhibition”,讲的是活跃的神经元对它周边神经元的影响 Response normalization reduces our top-1 and top-5 error rates by 1.4% and 1.2%
  • #11: Inception module的提出主要考虑多个不同size的卷积核能够hold图像当中不同cluster的信息,为方便计算,paper中分别使用1*1,3*3,5*5,同时加入3*3 max pooling模块。 然而这里存在一个很大的计算隐患,每一层Inception module的输出的filters将是分支所有filters数量的综合,经过多层之后,最终model的数量将会变得巨大,naive的inception会对计算资源有更大的依赖。
  • #12: Inception module的提出主要考虑多个不同size的卷积核能够hold图像当中不同cluster的信息,为方便计算,paper中分别使用1*1,3*3,5*5,同时加入3*3 max pooling模块。 然而这里存在一个很大的计算隐患,每一层Inception module的输出的filters将是分支所有filters数量的综合,经过多层之后,最终model的数量将会变得巨大,naive的inception会对计算资源有更大的依赖。  前面我们有提到Network-in-Network模型,1*1的模型能够有效进行降维(使用更少的来表达尽可能多的信息),所以文章提出了”Inception module with dimension reduction”,在不损失模型特征表示能力的前提下,尽量减少filters的数量,达到减少model复杂度的目的:
  • #14: Bug: 用chrome download graph png时是空白的,换成火狐是可以的
  • #15: VGGnet是Oxford的Visual Geometry Group的team,在ILSVRC 2014上的相关工作,主要工作是证明了增加网络的深度能够在一定程度上影响网络最终的性能,如下图,文章通过逐步增加网络深度来提高性能,虽然看起来有一点小暴力,没有特别多取巧的,但是确实有效,很多pretrained的方法就是使用VGG的model(主要是16和19),VGG相对其他的方法,参数空间很大,最终的model有500多m,alnext只有200m,googlenet更少,所以train一个vgg模型通常要花费更长的时间,所幸有公开的pretrained model让我们很方便的使用
  • #16: , and adding more layers to a suitably deep model leads to higher training error, as reported in [11, 42] and thoroughly verified by our experiments. Fig. 1 shows a typical example. The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart. But experiments show that our current solvers on hand are unable to find solutions that
  • #17: 假设潜在映射为H(x),使stacked nonlinear layers去拟合F(x):=H(x)-x,残差优化比优化H(x)更容易。 F(x)+x能够很容易通过”shortcut connections”来实现。
  • #18: Example network architectures for ImageNet. Left: the VGG-19 model [41] (19.6 billion FLOPs) as a reference. Mid- dle: a plain network with 34 parameter layers (3.6 billion FLOPs). Right: a residual network with 34 parameter layers (3.6 billion FLOPs). The dotted shortcuts increase dimensions. Table 1 shows more details and other variants. 左图是 Tensorflow 这里有一个bug, 做downsample的时候 cpu版本是无法运行的,但是gpu是ok的
  • #19: Tflearn 和tensorflow Keras 支持theano和tensorflow为backend,看底层源码比较麻烦 TensorLayer是国人写的,整个代码架构看起来有点乱,而且关注的人比较少 tflearn 好处:1, 方便科研人员和开发人员快速配置深度网络模型; 2,方便初学者学习tensorflow代码,tf源码的api多且复杂,看tflearn的源码可以很好的学习tf的这些日常使用的api TF-slim和Learn在官方源contrib里面不过觉着不好用
  • #21: 做一种图像分类通常有三种方法:1,从头训练某种网络结构用来进行图像识别;2,重新训练某个pre-trained model,更改类别,然后fit数据进行模型训练;3,同样是利用pre-trained model,load所有layers的参数,frozen其中某些层参数,fit数据进行模型训练。
  • #22: 做一种图像分类通常有三种方法:1,从头训练某种网络结构用来进行图像识别;2,重新训练某个pre-trained model,更改类别,然后fit数据进行模型训练;3,同样是利用pre-trained model,load所有layers的参数,frozen其中某些层参数,fit数据进行模型训练。
  • #23: 做一种图像分类通常有三种方法:1,从头训练某种网络结构用来进行图像识别;2,重新训练某个pre-trained model,更改类别,然后fit数据进行模型训练;3,同样是利用pre-trained model,load所有layers的参数,frozen其中某些层参数,fit数据进行模型训练。
  • #24: 做一种图像分类通常有三种方法:1,从头训练某种网络结构用来进行图像识别;2,重新训练某个pre-trained model,更改类别,然后fit数据进行模型训练;3,同样是利用pre-trained model,load所有layers的参数,frozen其中某些层参数,fit数据进行模型训练,更新unforzen的layer,比如最后一层softmax。
  • #25: Neural style 是图像合成的一个经典的东西 最开始的图像合成主要是基于MRF的方法 然后 CNN, 也有MRF+CNN 然后fast neural style
  • #26: MRF在图像分割、图像合成这块很经典的方法 Markov 随机场定义的直观意义是,xs 只受到其周围的点即Xr 的影响,而与其它的点无关“ 纹理合成所采用的 Markov Random Field (MRF)模型,是根据结果图中当前待合成像素点的邻域(或当前待合成纹理块的边界),在样本图中搜索所有像素(或纹理块),得到具有匹配邻域的像素点(或具有匹配边界的纹理块),将其作为当前待合成像素点(或纹理块)的最佳近似合成到结果图中”MRF模型认为纹理具有局部统计特征即纹理中的任一部分都可以由其周围部分(即邻域)来完全决定,这是对纹理的一种比较客观的认识“Markov 模型能够较好的刻画纹理的高阶统计特性 问题是只能考虑local的信息影响,不能再图像语义 全局上做更多的涉及
  • #27: 使用pre-trained的model,关键在于如何定义content loss 和style loss Content Reconstruction: 上图中下面部分是Content Reconstruction对应于CNN中的a,b,c,d,e层,注意最开始标了Content Representations的部分不是原始图片(可以理解是给计算机比如分类器看的图片,因此如果可视化它,可能完全就不知道是什么内容),而是经过了Pre-trained之后的VGG network model的图像数据, 该model主要用来做object recognition, 这里主要用来生成图像的Content Representations。理解了这里,后面就比较容易了,经过五层卷积网络来做Content的重构,文章作者实验发现在前3层的Content Reconstruction效果比较好,d,e两层丢失了部分细节信息,保留了比较high-level的信息。 Style Reconstruction: Style的重构比较复杂,很难去模型化Style这个东西,Style Represention的生成也是和Content Representation的生成类似,也是由VGG network model去做的,不同点在于a,b,c,d,e的处理方式不同,Style Represention的Reconstruction是在CNN的不同的子集上来计算的,怎么说呢,它会分别构造conv1_1(a),[conv1_1, conv2_1](b),[conv1_1, conv2_1, conv3_1],[conv1_1, conv2_1, conv3_1,conv4_1],[conv1_1, conv2_1, conv3_1, conv4_1, conv5_1]。这样重构的Style 会在各个不同的尺度上更加匹配图像本身的style,忽略场景的全局信息。
  • #28: MRF model 问题:local区域的统计信息无法建模复杂图像下的全局信息,优点。 Combined MRF and CNN: We transfer the style of x s into the layout of x_c by making the high-level neural encoding of x similar to x_c , but using local patches similar to those of x_s . The latter is the MRF prior that maintains the encoding of the style. 1,content的encode还是基于nn来做的和原来的方法一样; 2,style是采用MRF prior local patches来对style来进行编码,和neural style的方法不一样;
  • #29: MRF model 问题:local区域的统计信息无法建模复杂图像下的全局信息,优点。 Combined MRF and CNN: We transfer the style of x s into the layout of x_c by making the high-level neural encoding of x similar to x_c , but using local patches similar to those of x_s . The latter is the MRF prior that maintains the encoding of the style. 1,content的encode还是基于nn来做的和原来的方法一样; 2,style是采用MRF prior local patches来对style来进行编码,和neural style的方法不一样;
  • #30: Fast neural style image transformation network、 loss netwrok;Image Transformation network是一个deep residual conv netwrok,用来将输入图像(content image)直接transform为带有style的图像;而loss network参数是fixed的,这里的loss network和A Neural Algorithm of Aritistic Style中的网络结构一致,只是参数不做更新,只用来做content loss 和style loss的计算,这个就是所谓的perceptual loss,作者是这样解释的为Image Classification的pretrained的卷积模型已经很好的学习了perceptual和semantic information(场景和语义信息),所以后面的整个loss network仅仅是为了计算content loss和style loss,而不像A Neural Algorithm of Aritistic Style做更新这部分网络的参数,这里更新的是前面的transform network的参数,所以从整个网络结构上来看输入图像通过transform network得到转换的图像,然后计算对应的loss,整个网络通过最小化这个loss去update前面的transform network
  • #31: Chicago\hoovertowernight\sh\sh2 * [composition_vii, la_muse, starry_night, the_wave]
  • #32: Neural style 是图像合成的一个经典的东西 最开始的图像合成主要是基于MRF的方法 然后 CNN, 也有MRF+CNN 然后fast neural style
  • #33: GAN同时要训练一个生成网络(Generator)和一个判别网络(Discriminator),前者输入一个noise变量 z ,输出一个伪图片数据 G(z;θg),后者输入一个图片(real image)/伪图片(fake image)数据 x ,输出一个表示该输入是自然图片或者伪造图片的二分类置信度 D(x;θd),理想情况下,判别器 D需要尽可能准确的判断输入数据到底是一个真实的图片还是某种伪造的图片,而生成器G又需要尽最大可能去欺骗D,让D把自己产生的伪造图片全部判断成真实的图片。
  • #34: Generator network用带噪声的text encoder vector(hybrid of character-level convnet with a recurrent neural network)去生成image,尽量去fool discriminator Network, Discriminator输入real/fake, learn the score as fake
  • #35: Ucloud gpu上跑的 300多个epoch
  • #36: NIC:Neural Image Caption Bug: build with the source from github, tf.gfile error when preprocess the images to be tfrecord, fixed by using the open(). 整个训练时间太长,结果还没有run出来
  • #37: 很容易p(s_t|I, S_0,…,S_t-1)很容易被recurrent neural network建模 where θ are the parameters of our model, I is an image, and S its correct transcription. S represents any sentence, its length is unbounded, 所以使用RNN(LSTM) 图像的表示使用CNN(batch normalization),and change the model in cvpr 15 其他细节可能需要去阅读源码,很有意思 携程之前有李健team做过这块的分享 Image cnn model batch normalization model,然后fit到lstm来进行相应nlp相关的train
  • #38: Cmd 调用相关API驱动manage模块,启动ps/worker从UFile拿到代码和数据,然后做模型训练,训练完成后上传到ufile
  • #39: ulcoud上如何做深度学习? 1,收集训练数据放到ufile上你的bucket下; 2,写代码: A,主要定义代码中的build_model; B,定义如何运行model,run_model; 3,本地测试是否能run起来; 4,上传代码,启动线上多机并行化job
  • #40: ulcoud上如何做深度学习? 1,收集训练数据放到ufile上你的bucket下; 2,写代码: A,主要定义代码中的build_model; B,定义如何运行model,run_model; 3,本地测试是否能run起来; 4,上传代码,启动线上多机并行化job
  • #41: 定义如何run model,