http://blog.csdn.net/u012162613/article/details/43225445
本文主要是详细地解读CNN的实现代码。如果你没学习过CNN,在此推荐周晓艺师兄的博文:Deep Learning(深度学习)学习笔记整理系列之(七),以及UFLDL上的卷积特征提取、池化
CNN的最大特点就是稀疏连接(局部感受)和权值共享,如下面两图所示,左为稀疏连接,右为权值共享。稀疏连接和权值共享可以减少所要训练的参数,减少计算复杂度。
至于CNN的结构,以经典的LeNet5来说明:
这个图真是无处不在,一谈CNN,必说LeNet5,这图来自于这篇论文:Gradient-Based Learning Applied to Document Recognition,论文很长,第7页那里开始讲LeNet5这个结构,建议看看那部分。
我这里简单说一下,LeNet5这张图从左到右,先是input,这是输入层,即输入的图片。input-layer到C1这部分就是一个卷积层(convolution运算),C1到S2是一个子采样层(pooling运算),关于卷积和子采样的具体过程可以参考下图:
然后,S2到C3又是卷积,C3到S4又是子采样,可以发现,卷积和子采样都是成对出现的,卷积后面一般跟着子采样。S4到C5之间是全连接的,这就相当于一个MLP的隐含层了(如果你不清楚MLP,参考《DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解》)。C5到F6同样是全连接,也是相当于一个MLP的隐含层。最后从F6到输出output,其实就是一个分类器,这一层就叫分类层。
ok,CNN的基本结构大概就是这样,由输入、卷积层、子采样层、全连接层、分类层、输出这些基本“构件”组成,一般根据具体的应用或者问题,去确定要多少卷积层和子采样层、采用什么分类器。当确定好了结构以后,如何求解层与层之间的连接参数?一般采用向前传播(FP)+向后传播(BP)的方法来训练。具体可参考上面给出的链接。
二、CNN卷积神经网络代码详细解读(基于python+theano)
代码来自于深度学习教程:Convolutional Neural Networks (LeNet),这个代码实现的是一个简化了的LeNet5,具体如下:
∙没有实现location-specific gain and bias parameters
∙用的是maxpooling,而不是average_pooling
∙分类器用的是softmax,LeNet5用的是rbf
∙LeNet5第二层并不是全连接的,本程序实现的是全连接
另外,代码里将卷积层和子采用层合在一起,定义为“LeNetConvPoolLayer“(卷积采样层),这好理解,因为它们总是成对出现。但是有个地方需要注意,代码中将卷积后的输出直接作为子采样层的输入,而没有加偏置b再通过sigmoid函数进行映射,即没有了下图中fx后面的bx以及sigmoid映射,也即直接由fx得到Cx。
最后,代码中第一个卷积层用的卷积核有20个,第二个卷积层用50个,而不是上面那张LeNet5图中所示的6个和16个。
了解了这些,下面看代码:
(1)导入必要的模块
[python] view plain copy
1.import cPickle
2.import gzip
3.import os
4.import sys
5.import time
6.
7.import numpy
8.
9.import theano
10.import theano.tensor as T
11.from theano.tensor.signal import downsample
12.from theano.tensor.nnet import conv
(2)定义CNN的基本"构件"
CNN的基本构件包括卷积采样层、隐含层、分类器,如下
∙定义LeNetConvPoolLayer(卷积+采样层)
见代码注释:
[python] view plain copy
1."""
2.卷积+下采样合成一个层LeNetConvPoolLayer
3.rng:随机数生成器,用于初始化W
4.input:4维的向量,theano.tensor.dtensor4
5.filter_shape:(number of filters, num input feature maps,filter height, filter width)
6.image_shape:(batch size, num input feature maps,image height, image width)
7.poolsize: (#rows, #cols)
8."""
9.class LeNetConvPoolLayer(object):
10. def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
11.
12.#assert condition,condition为True,则继续往下执行,condition为False,中断程序
13.#image_shape[1]和filter_shape[1]都是num input feature maps,它们必须是一样的。
14. assert image_shape[1] == filter_shape[1]
15. self.input = input
16.
17.#每个隐层神经元(即像素)与上一层的连接数为num input feature maps * filter height * filter width。
18.#可以用numpy.prod(filter_shape[1:])来求得
19. fan_in = numpy.prod(filter_shape[1:])
20.
21.#lower layer上每个神经元获得的梯度来自于:"num output feature maps * filter height * filter width" /pooling size
22. fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
23. numpy.prod(poolsize))
24.
25.#以上求得fan_in、fan_out ,将它们代入公式,以此来随机初始化W,W就是线性卷积核
26. W_bound = numpy.sqrt(6. / (fan_in + fan_out))
27. self.W = theano.shared(
28. numpy.asarray(
29. rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
30. dtype=theano.config.floatX
31. ),
32. borrow=True
33. )
34.
35.# the bias is a 1D tensor -- one bias per output feature map
36.#偏置b是一维向量,每个输出图的特征图都对应一个偏置,
37.#而输出的特征图的个数由filter个数决定,因此用filter_shape[0]即number of filters来初始化
38. b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
39. self.b = theano.shared(value=b_values, borrow=True)
40.
41.#将输入图像与filter卷积,conv.conv2d函数
42.#卷积完没有加b再通过sigmoid,这里是一处简化。
43. conv_out = conv.conv2d(
44. input=input,
45. filters=self.W,
46. filter_shape=filter_shape,
47. image_shape=image_shape
48. )
49.
50.#maxpooling,最大子采样过程
51. pooled_out = downsample.max_pool_2d(
52. input=conv_out,
53. ds=poolsize,
54. ignore_border=True
55. )
56.
57.#加偏置,再通过tanh映射,得到卷积+子采样层的最终输出
58.#因为b是一维向量,这里用维度转换函数dimshuffle将其reshape。比如b是(10,),
59.#则b.dimshuffle('x', 0, 'x', 'x'))将其reshape为(1,10,1,1)
60. self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
61.#卷积+采样层的参数
62. self.params = [self.W, self.b]
∙定义隐含层HiddenLayer
这个跟上一篇文章《 DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解》中的HiddenLayer是一致的,直接拿过来:
[python] view plain copy
1."""
2.注释:
3.这是定义隐藏层的类,首先明确:隐藏层的输入即input,输出即隐藏层的神经元个数。输入层与隐藏层是全连接的。
4.假设输入是n_in维的向量(也可以说时n_in个神经元),隐藏层有n_out个神经元,则因为是全连接,
5.一共有n_in*n_out个权重,故W大小时(n_in,n_out),n_in行n_out列,每一列对应隐藏层的每一个神经元的连接权重。
6.b是偏置,隐藏层有n_out个神经元,故b时n_out维向量。
7.rng即随机数生成器,numpy.random.RandomState,用于初始化W。
8.input训练模型所用到的所有输入,并不是MLP的输入层,MLP的输入层的神经元个数时n_in,而这里的参数input大小是(n_example,n_in),每一行一个样本,即每一行作为MLP的输入层。
9.activation:激活函数,这里定义为函数tanh
10."""
11.class HiddenLayer(object):
12. def __init__(self, rng, input, n_in, n_out, W=None, b=None,
13. activation=T.tanh):
14. self.input = input #类HiddenLayer的input即所传递进来的input
15.
16. """
17. 注释:
18. 代码要兼容GPU,则必须使用 dtype=theano.config.floatX,并且定义为theano.shared
19. 另外,W的初始化有个规则:如果使用tanh函数,则在-sqrt(6./(n_in+n_hidden))到sqrt(6./(n_in+n_hidden))之间均匀
20. 抽取数值来初始化W,若时sigmoid函数,则以上再乘4倍。
21. """
22. #如果W未初始化,则根据上述方法初始化。
23. #加入这个判断的原因是:有时候我们可以用训练好的参数来初始化W,见我的上一篇文章。
24. if W is None:
25. W_values = numpy.asarray(
26. rng.uniform(
27. low=-numpy.sqrt(6. / (n_in + n_out)),
28. high=numpy.sqrt(6. / (n_in + n_out)),
29. size=(n_in, n_out)
30. ),
31. dtype=theano.config.floatX
32. )
33. if activation == theano.tensor.nnet.sigmoid:
34. W_values *= 4
35. W = theano.shared(value=W_values, name='W', borrow=True)
36.
37. if b is None:
38. b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
39. b = theano.shared(value=b_values, name='b', borrow=True)
40.
41. #用上面定义的W、b来初始化类HiddenLayer的W、b
42. self.W = W
43. self.b = b
44.
45. #隐含层的输出
46. lin_output = T.dot(input, self.W) + self.b
47. self.output = (
48. lin_output if activation is None
49. else activation(lin_output)
50. )
51.
52. #隐含层的参数
53. self.params = [self.W, self.b]
∙定义分类器 (Softmax回归)
采用Softmax,这跟《DeepLearning tutorial(1)Softmax回归原理简介+代码详解》中的LogisticRegression是一样的,直接拿过来:
[python] view plain copy
1."""
2.定义分类层LogisticRegression,也即Softmax回归
3.在deeplearning tutorial中,直接将LogisticRegression视为Softmax,
4.而我们所认识的二类别的逻辑回归就是当n_out=2时的LogisticRegression
5."""
6.#参数说明:
7.#input,大小就是(n_example,n_in),其中n_example是一个batch的大小,
8.#因为我们训练时用的是Minibatch SGD,因此input这样定义
9.#n_in,即上一层(隐含层)的输出
10.#n_out,输出的类别数
11.class LogisticRegression(object):
12. def __init__(self, input, n_in, n_out):
13.
14.#W大小是n_in行n_out列,b为n_out维向量。即:每个输出对应W的一列以及b的一个元素。
15. self.W = theano.shared(
16. value=numpy.zeros(
17. (n_in, n_out),
18. dtype=theano.config.floatX
19. ),
20. name='W',
21. borrow=True
22. )
23.
24. self.b = theano.shared(
25. value=numpy.zeros(
26. (n_out,),
27. dtype=theano.config.floatX
28. ),
29. name='b',
30. borrow=True
31. )
32.
33.#input是(n_example,n_in),W是(n_in,n_out),点乘得到(n_example,n_out),加上偏置b,
34.#再作为T.nnet.softmax的输入,得到p_y_given_x
35.#故p_y_given_x每一行代表每一个样本被估计为各类别的概率
36.#PS:b是n_out维向量,与(n_example,n_out)矩阵相加,内部其实是先复制n_example个b,
37.#然后(n_example,n_out)矩阵的每一行都加b
38. self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
39.
40.#argmax返回最大值下标,因为本例数据集是MNIST,下标刚好就是类别。axis=1表示按行操作。
41. self.y_pred = T.argmax(self.p_y_given_x, axis=1)
42.
43.#params,LogisticRegression的参数
44. self.params = [self.W, self.b]
到这里,CNN的基本”构件“都有了,下面要用这些”构件“组装成LeNet5(当然,是简化的,上面已经说了),具体来说,就是组装成:LeNet5=input+LeNetConvPoolLayer_1+LeNetConvPoolLayer_2+HiddenLayer+LogisticRegression+output。
然后将其应用于MNIST数据集,用BP算法去解这个模型,得到最优的参数。
(3)加载MNIST数据集(mnist.pkl.gz)
[python] view plain copy
1."""
2.加载MNIST数据集load_data()
3."""
4.def load_data(dataset):
5. # dataset是数据集的路径,程序首先检测该路径下有没有MNIST数据集,没有的话就下载MNIST数据集
6. #这一部分就不解释了,与softmax回归算法无关。
7. data_dir, data_file = os.path.split(dataset)
8. if data_dir == "" and not os.path.isfile(dataset):
9. # Check if dataset is in the data directory.
10. new_path = os.path.join(
11. os.path.split(__file__)[0],
12. "..",
13. "data",
14. dataset
15. )
16. if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
17. dataset = new_path
18.
19. if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
20. import urllib
21. origin = (
22. 'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
23. )
24. print 'Downloading data from %s' % origin
25. urllib.urlretrieve(origin, dataset)
26.
27. print '... loading data'
28.#以上是检测并下载数据集mnist.pkl.gz,不是本文重点。下面才是load_data的开始
29.
30.#从"mnist.pkl.gz"里加载train_set, valid_set, test_set,它们都是包括label的
31.#主要用到python里的gzip.open()函数,以及 cPickle.load()。
32.#‘rb’表示以二进制可读的方式打开文件
33. f = gzip.open(dataset, 'rb')
34. train_set, valid_set, test_set = cPickle.load(f)
35. f.close()
36.
37.
38.#将数据设置成shared variables,主要时为了GPU加速,只有shared variables才能存到GPU memory中
39.#GPU里数据类型只能是float。而data_y是类别,所以最后又转换为int返回
40. def shared_dataset(data_xy, borrow=True):
41. data_x, data_y = data_xy
42. shared_x = theano.shared(numpy.asarray(data_x,
43. dtype=theano.config.floatX),
44. borrow=borrow)
45. shared_y = theano.shared(numpy.asarray(data_y,
46. dtype=theano.config.floatX),
47. borrow=borrow)
48. return shared_x, T.cast(shared_y, 'int32')
49.
50.
51. test_set_x, test_set_y = shared_dataset(test_set)
52. valid_set_x, valid_set_y = shared_dataset(valid_set)
53. train_set_x, train_set_y = shared_dataset(train_set)
54.
55. rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
56. (test_set_x, test_set_y)]
57. return rval
(4)实现LeNet5并测试
[python] view plain copy
1."""
2.实现LeNet5
3.LeNet5有两个卷积层,第一个卷积层有20个卷积核,第二个卷积层有50个卷积核
4."""
5.def evaluate_lenet5(learning_rate=0.1, n_epochs=200,
6. dataset='mnist.pkl.gz',
7. nkerns=[20, 50], batch_size=500):
8. """
9. learning_rate:学习速率,随机梯度前的系数。
10. n_epochs训练步数,每一步都会遍历所有batch,即所有样本
11. batch_size,这里设置为500,即每遍历完500个样本,才计算梯度并更新参数
12. nkerns=[20, 50],每一个LeNetConvPoolLayer卷积核的个数,第一个LeNetConvPoolLayer有
13. 20个卷积核,第二个有50个
14. """
15.
16. rng = numpy.random.RandomState(23455)
17.
18. #加载数据
19. datasets = load_data(dataset)
20. train_set_x, train_set_y = datasets[0]
21. valid_set_x, valid_set_y = datasets[1]
22. test_set_x, test_set_y = datasets[2]
23.
24. # 计算batch的个数
25. n_train_batches = train_set_x.get_value(borrow=True).shape[0]
26. n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
27. n_test_batches = test_set_x.get_value(borrow=True).shape[0]
28. n_train_batches /= batch_size
29. n_valid_batches /= batch_size
30. n_test_batches /= batch_size
31.
32. #定义几个变量,index表示batch下标,x表示输入的训练数据,y对应其标签
33. index = T.lscalar()
34. x = T.matrix('x')
35. y = T.ivector('y')
36.
37. ######################
38. # BUILD ACTUAL MODEL #
39. ######################
40. print '... building the model'
41.
42.
43.#我们加载进来的batch大小的数据是(batch_size, 28 * 28),但是LeNetConvPoolLayer的输入是四维的,所以要reshape
44. layer0_input = x.reshape((batch_size, 1, 28, 28))
45.
46.# layer0即第一个LeNetConvPoolLayer层
47.#输入的单张图片(28,28),经过conv得到(28-5+1 , 28-5+1) = (24, 24),
48.#经过maxpooling得到(24/2, 24/2) = (12, 12)
49.#因为每个batch有batch_size张图,第一个LeNetConvPoolLayer层有nkerns[0]个卷积核,
50.#故layer0输出为(batch_size, nkerns[0], 12, 12)
51. layer0 = LeNetConvPoolLayer(
52. rng,
53. input=layer0_input,
54. image_shape=(batch_size, 1, 28, 28),
55. filter_shape=(nkerns[0], 1, 5, 5),
56. poolsize=(2, 2)
57. )
58.
59.
60.#layer1即第二个LeNetConvPoolLayer层
61.#输入是layer0的输出,每张特征图为(12,12),经过conv得到(12-5+1, 12-5+1) = (8, 8),
62.#经过maxpooling得到(8/2, 8/2) = (4, 4)
63.#因为每个batch有batch_size张图(特征图),第二个LeNetConvPoolLayer层有nkerns[1]个卷积核
.#,故layer1输出为(batch_size, nkerns[1], 4, 4)
65. layer1 = LeNetConvPoolLayer(
66. rng,
67. input=layer0.output,
68. image_shape=(batch_size, nkerns[0], 12, 12),#输入nkerns[0]张特征图,即layer0输出nkerns[0]张特征图
69. filter_shape=(nkerns[1], nkerns[0], 5, 5),
70. poolsize=(2, 2)
71. )
72.
73.
74.#前面定义好了两个LeNetConvPoolLayer(layer0和layer1),layer1后面接layer2,这是一个全连接层,相当于MLP里面的隐含层
75.#故可以用MLP中定义的HiddenLayer来初始化layer2,layer2的输入是二维的(batch_size, num_pixels) ,
76.#故要将上层中同一张图经不同卷积核卷积出来的特征图合并为一维向量,
77.#也就是将layer1的输出(batch_size, nkerns[1], 4, 4)flatten为(batch_size, nkerns[1]*4*4)=(500,800),作为layer2的输入。
78.#(500,800)表示有500个样本,每一行代表一个样本。layer2的输出大小是(batch_size,n_out)=(500,500)
79. layer2_input = layer1.output.flatten(2)
80. layer2 = HiddenLayer(
81. rng,
82. input=layer2_input,
83. n_in=nkerns[1] * 4 * 4,
84. n_out=500,
85. activation=T.tanh
86. )
87.
88.#最后一层layer3是分类层,用的是逻辑回归中定义的LogisticRegression,
.#layer3的输入是layer2的输出(500,500),layer3的输出就是(batch_size,n_out)=(500,10)
90. layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
91.
92.#代价函数NLL
93. cost = layer3.negative_log_likelihood(y)
94.
95.# test_model计算测试误差,x、y根据给定的index具体化,然后调用layer3,
96.#layer3又会逐层地调用layer2、layer1、layer0,故test_model其实就是整个CNN结构,
97.#test_model的输入是x、y,输出是layer3.errors(y)的输出,即误差。
98. test_model = theano.function(
99. [index],
100. layer3.errors(y),
101. givens={
102. x: test_set_x[index * batch_size: (index + 1) * batch_size],
103. y: test_set_y[index * batch_size: (index + 1) * batch_size]
104. }
105. )
106.#validate_model,验证模型,分析同上。
107. validate_model = theano.function(
108. [index],
109. layer3.errors(y),
110. givens={
111. x: valid_set_x[index * batch_size: (index + 1) * batch_size],
112. y: valid_set_y[index * batch_size: (index + 1) * batch_size]
113. }
114. )
115.
116.#下面是train_model,涉及到优化算法即SGD,需要计算梯度、更新参数
117. #参数集
118. params = layer3.params + layer2.params + layer1.params + layer0.params
119.
120. #对各个参数的梯度
121. grads = T.grad(cost, params)
122.
123.#因为参数太多,在updates规则里面一个一个具体地写出来是很麻烦的,所以下面用了一个for..in..,自动生成规则对(param_i, param_i - learning_rate * grad_i)
124. updates = [
125. (param_i, param_i - learning_rate * grad_i)
126. for param_i, grad_i in zip(params, grads)
127. ]
128.
129.#train_model,代码分析同test_model。train_model里比test_model、validation_model多出updates规则
130. train_model = theano.function(
131. [index],
132. cost,
133. updates=updates,
134. givens={
135. x: train_set_x[index * batch_size: (index + 1) * batch_size],
136. y: train_set_y[index * batch_size: (index + 1) * batch_size]
137. }
138. )
139.
140.
141. ###############
142. # 开始训练 #
143. ###############
144. print '... training'
145. patience = 10000
146. patience_increase = 2
147. improvement_threshold = 0.995
148.
149. validation_frequency = min(n_train_batches, patience / 2)
150. #这样设置validation_frequency可以保证每一次epoch都会在验证集上测试。
151.
152. best_validation_loss = numpy.inf #最好的验证集上的loss,最好即最小
153. best_iter = 0 #最好的迭代次数,以batch为单位。比如best_iter=10000,说明在训练完第10000个batch时,达到best_validation_loss
154. test_score = 0.
155. start_time = time.clock()
156.
157. epoch = 0
158. done_looping = False
159.
160.#下面就是训练过程了,while循环控制的时步数epoch,一个epoch会遍历所有的batch,即所有的图片。
161.#for循环是遍历一个个batch,一次一个batch地训练。for循环体里会用train_model(minibatch_index)去训练模型,
162.#train_model里面的updatas会更新各个参数。
163.#for循环里面会累加训练过的batch数iter,当iter是validation_frequency倍数时则会在验证集上测试,
1.#如果验证集的损失this_validation_loss小于之前最佳的损失best_validation_loss,
165.#则更新best_validation_loss和best_iter,同时在testset上测试。
166.#如果验证集的损失this_validation_loss小于best_validation_loss*improvement_threshold时则更新patience。
167.#当达到最大步数n_epoch时,或者patience 169. epoch = epoch + 1 170. for minibatch_index in xrange(n_train_batches): 171. 172. iter = (epoch - 1) * n_train_batches + minibatch_index 173. 174. if iter % 100 == 0: 175. print 'training @ iter = ', iter 176. cost_ij = train_model(minibatch_index) 177.#cost_ij 没什么用,后面都没有用到,只是为了调用train_model,而train_model有返回值 178. if (iter + 1) % validation_frequency == 0: 179. 180. # compute zero-one loss on validation set 181. validation_losses = [validate_model(i) for i 182. in xrange(n_valid_batches)] 183. this_validation_loss = numpy.mean(validation_losses) 184. print('epoch %i, minibatch %i/%i, validation error %f %%' % 185. (epoch, minibatch_index + 1, n_train_batches, 186. this_validation_loss * 100.)) 187. 188. 1. if this_validation_loss < best_validation_loss: 190. 191. 192. if this_validation_loss < best_validation_loss * \\ 193. improvement_threshold: 194. patience = max(patience, iter * patience_increase) 195. 196. 197. best_validation_loss = this_validation_loss 198. best_iter = iter 199. 200. 201. test_losses = [ 202. test_model(i) 203. for i in xrange(n_test_batches) 204. ] 205. test_score = numpy.mean(test_losses) 206. print((' epoch %i, minibatch %i/%i, test error of ' 207. 'best model %f %%') % 208. (epoch, minibatch_index + 1, n_train_batches, 209. test_score * 100.)) 210. 211. if patience <= iter: 212. done_looping = True 213. break 214. 215. end_time = time.clock() 216. print('Optimization complete.') 217. print('Best validation score of %f %% obtained at iteration %i, ' 218. 'with test performance %f %%' % 219. (best_validation_loss * 100., best_iter + 1, test_score * 100.)) 220. print >> sys.stderr, ('The code for file ' + 221. os.path.split(__file__)[1] + 222. ' ran for %.2fm' % ((end_time - start_time) / 60.)) Convolutional Neural Networks (LeNet) http://deeplearning.net/tutorial/contents.html The Convolution Operator ConvOp is the main workhorse for implementing a convolutional layer in Theano. ConvOp is used bytheano.tensor.signal.conv2d, which takes two symbolic inputs: ∙a 4D tensor corresponding to a mini-batch of input images. The shape of the tensor is as follows: [mini-batch size, number of input feature maps, image height, image width]. ∙a 4D tensor corresponding to the weight matrix . The shape of the tensor is: [number of feature maps at layer m, number of feature maps at layer m-1, filter height, filter width] Below is the Theano code for implementing a convolutional layer similar to the one of Figure 1. The input consists of 3 features maps (an RGB color image) of size 120x160. We use two convolutional filters with 9x9 receptive fields. import theano from theano import tensor as T from theano.tensor.nnet import conv2d import numpy rng = numpy.random.RandomState(23455) # instantiate 4D tensor for input input = T.tensor4(name='input') # initialize shared variable for weights. w_shp = (2, 3, 9, 9) w_bound = numpy.sqrt(3 * 9 * 9) W = theano.shared( numpy.asarray( rng.uniform( low=-1.0 / w_bound, high=1.0 / w_bound, size=w_shp), dtype=input.dtype), name ='W') # initialize shared variable for bias (1D tensor) with random values # IMPORTANT: biases are usually initialized to zero. However in this # particular application, we simply apply the convolutional layer to # an image without learning the parameters. We therefore initialize # them to random values to "simulate" learning. b_shp = (2,) b = theano.shared(numpy.asarray( rng.uniform(low=-.5, high=.5, size=b_shp), dtype=input.dtype), name ='b') # build symbolic expression that computes the convolution of input with filters in w conv_out = conv2d(input, W) # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output # A few words on ``dimshuffle`` : # ``dimshuffle`` is a powerful tool in reshaping a tensor; # what it allows you to do is to shuffle dimension around # but also to insert new ones along which the tensor will be # broadcastable; # dimshuffle('x', 2, 'x', 0, 1) # This will work on 3d tensors with no broadcastable # dimensions. The first dimension will be broadcastable, # then we will have the third dimension of the input tensor as # the second of the resulting tensor, etc. If the tensor has # shape (20, 30, 40), the resulting tensor will have dimensions # (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor) # More examples: # dimshuffle('x') -> make a 0d (scalar) into a 1d vector # dimshuffle(0, 1) -> identity # dimshuffle(1, 0) -> inverts the first and second dimensions # dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN) # dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1) # dimshuffle(2, 0, 1) -> AxBxC to CxAxB # dimshuffle(0, 'x', 1) -> AxB to Ax1xB # dimshuffle(1, 'x', 0) -> AxB to Bx1xA output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x')) # create theano function to compute filtered images f = theano.function([input], output) MaxPooling from theano.tensor.signal import pool input = T.dtensor4('input') maxpool_shape = (2, 2) pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=True) f = theano.function([input],pool_out) invals = numpy.random.RandomState(1).rand(3, 2, 5, 5) print 'With ignore_border set to True:' print 'invals[0, 0, :, :] =\\n', invals[0, 0, :, :] print 'output[0, 0, :, :] =\\n', f(invals)[0, 0, :, :] pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=False) f = theano.function([input],pool_out) print 'With ignore_border set to False:' print 'invals[1, 0, :, :] =\\n ', invals[1, 0, :, :] print 'output[1, 0, :, :] =\\n ', f(invals)[1, 0, :, :] The Full Model: LeNet 请注意,术语“卷积”可以对应于不同的数算: theano.tensor.nnet.conv2d,这是几乎所有的最近发表的卷积模型最常用的一个。在该操作中,每个输出特征映射通过不同的2d滤波器连接到每个输入特征映射,其值是通过相应滤波器的所有输入的单个卷积的和。 在原来的LeNet模型的卷积:在这项工作中,每个输出特征映射只能连接到输入特征映射的一个子集。 用于信号处理的卷积:theano.tensor.signal.conv.conv2d,它只适用于单通道输入。 在这里,我们使用的第一个操作,所以这个模型略有不同,从原来的LeNet研究。使用2的原因之一。将减少所需的计算量,但现代硬件使其具有完全连接模式的快速性。另一个原因是稍微减少自由参数的数量,但是我们还有其他的正则化技术。 class LeNetConvPoolLayer(object): """Pool Layer of a convolutional network """ def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)): """ Allocate a LeNetConvPoolLayer with shared variable internal parameters. :type rng: numpy.random.RandomState :param rng: a random number generator used to initialize weights :type input: theano.tensor.dtensor4 :param input: symbolic image tensor, of shape image_shape :type filter_shape: tuple or list of length 4 :param filter_shape: (number of filters, num input feature maps, filter height, filter width) :type image_shape: tuple or list of length 4 :param image_shape: (batch size, num input feature maps, image height, image width) :type poolsize: tuple or list of length 2 :param poolsize: the downsampling (pooling) factor (#rows, #cols) """ assert image_shape[1] == filter_shape[1] self.input = input # there are "num input feature maps * filter height * filter width" # inputs to each hidden unit fan_in = numpy.prod(filter_shape[1:]) # each unit in the lower layer receives a gradient from: # "num output feature maps * filter height * filter width" / # pooling size fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) // numpy.prod(poolsize)) # initialize weights with random weights W_bound = numpy.sqrt(6. / (fan_in + fan_out)) self.W = theano.shared( numpy.asarray( rng.uniform(low=-W_bound, high=W_bound, size=filter_shape), dtype=theano.config.floatX ), borrow=True ) # the bias is a 1D tensor -- one bias per output feature map b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX) self.b = theano.shared(value=b_values, borrow=True) # convolve input feature maps with filters conv_out = conv2d( input=input, filters=self.W, filter_shape=filter_shape, input_shape=image_shape ) # pool each feature map individually, using maxpooling pooled_out = pool.pool_2d( input=conv_out, ds=poolsize, ignore_border=True ) # add the bias term. Since the bias is a vector (1D array), we first # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will # thus be broadcasted across mini-batches and feature map # width & height self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x')) # store parameters of this layer self.params = [self.W, self.b] # keep track of model input self.input = input 注意,在初始化权重值时,扇入取决于接收字段的大小和输入特征映射的数量。 最后,利用Logistic回归和多层感知器的分类MNIST数字隐含定义类定义的回归类,我们可以实例化网络如下。 x = T.matrix('x') # the data is presented as rasterized images y = T.ivector('y') # the labels are presented as 1D vector of # [int] labels ###################### # BUILD ACTUAL MODEL # ###################### print('... building the model') # Reshape matrix of rasterized images of shape (batch_size, 28 * 28) # to a 4D tensor, compatible with our LeNetConvPoolLayer # (28, 28) is the size of MNIST images. layer0_input = x.reshape((batch_size, 1, 28, 28)) # Construct the first convolutional pooling layer: # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24) # maxpooling reduces this further to (24/2, 24/2) = (12, 12) # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12) layer0 = LeNetConvPoolLayer( rng, input=layer0_input, image_shape=(batch_size, 1, 28, 28), filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2, 2) ) # Construct the second convolutional pooling layer # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8) # maxpooling reduces this further to (8/2, 8/2) = (4, 4) # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4) layer1 = LeNetConvPoolLayer( rng, input=layer0.output, image_shape=(batch_size, nkerns[0], 12, 12), filter_shape=(nkerns[1], nkerns[0], 5, 5), poolsize=(2, 2) ) # the HiddenLayer being fully-connected, it operates on 2D matrices of # shape (batch_size, num_pixels) (i.e matrix of rasterized images). # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4), # or (500, 50 * 4 * 4) = (500, 800) with the default values. layer2_input = layer1.output.flatten(2) # construct a fully-connected sigmoidal layer layer2 = HiddenLayer( rng, input=layer2_input, n_in=nkerns[1] * 4 * 4, n_out=500, activation=T.tanh ) # classify the values of the fully-connected sigmoidal layer layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10) # the cost we minimize during training is the NLL of the model cost = layer3.negative_log_likelihood(y) # create a function to compute the mistakes that are made by the model test_model = theano.function( [index], layer3.errors(y), givens={ x: test_set_x[index * batch_size: (index + 1) * batch_size], y: test_set_y[index * batch_size: (index + 1) * batch_size] } ) validate_model = theano.function( [index], layer3.errors(y), givens={ x: valid_set_x[index * batch_size: (index + 1) * batch_size], y: valid_set_y[index * batch_size: (index + 1) * batch_size] } ) # create a list of all model parameters to be fit by gradient descent params = layer3.params + layer2.params + layer1.params + layer0.params # create a list of gradients for all model parameters grads = T.grad(cost, params) # train_model is a function that updates the model parameters by # SGD Since this model has many parameters, it would be tedious to # manually create an update rule for each model parameter. We thus # create the updates list by automatically looping over all # (params[i], grads[i]) pairs. updates = [ (param_i, param_i - learning_rate * grad_i) for param_i, grad_i in zip(params, grads) ] train_model = theano.function( [index], cost, updates=updates, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] } ) Tips and Tricks