用簡單的 2D CNN 進行 MNIST 數字識別

本文作者： AI研習社-譯站

2018-07-23 10:04

導語：對于圖像分類任務，當前最先進的架構是卷積神經網絡 (CNNs)，無論是面部識別、自動駕駛還是目標檢測，CNN 都得到廣泛使用。

雷鋒網 AI 研習社按：本文為雷鋒網字幕組編譯的技術博客，原標題 A simple 2D CNN for MNIST digit recognition，作者為 Sambit Mahapatra。

翻譯 | 王祎校對 | 霍雷剛整理 | 孔令雙

對于圖像分類任務，當前最先進的架構是卷積神經網絡 (CNNs).。無論是面部識別、自動駕駛還是目標檢測，CNN 得到廣泛使用。在本文中，針對著名的 MNIST 數字識別任務，我們設計了一個以 tensorflow 為后臺技術、基于 keras 的簡單 2D 卷積神經網絡 (CNN) 模型。整個工作流程如下:

1. 準備數據

2. 創建模型并編譯

3. 訓練模型并評估

4. 將模型存盤以便下次使用

用簡單的 2D CNN 進行 MNIST 數字識別

數據集就使用上文所提到的 MNIST 數據集。MNIST 數據集 (Modified National Institute of Standards and Technoloy 數據集) 是一個大型的手寫數字（0 到 9）數據集。該數據集包含大小為 28x28 的圖片 7 萬張，其中 6 萬張訓練圖片、1 萬張測試圖片。第一步，加載數據集，這一步可以很容易地通過 keras api 來實現。

import keras
from keras.datasets import mnist
#load mnist dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data() #everytime loading data won't be so easy :)

其中，X_train 包含 6 萬張大小為 28x28 的訓練圖片，y_train 包含這些圖片對應的標簽。與之類似，X_test 包含了 1 萬張大小為 28x28 的測試圖片，y_test 為其對應的標簽。我們將一部分訓練數據可視化一下，來對深度學習模型的目標有一個認識吧。

import matplotlib.pyplot as plt
fig = plt.figure()
for i in range(9):
plt.subplot(3,3,i+1)
plt.tight_layout()
plt.imshow(X_train[i], cmap='gray', interpolation='none')
plt.title("Digit: {}".format(y_train[i]))
plt.xticks([])
plt.yticks([])
fig

用簡單的 2D CNN 進行 MNIST 數字識別

如上所示，左上角圖為「5」的圖片數據被存在 X_train[0] 中，y_train[0] 中存儲其對應的標簽「5」。我們的深度學習模型應該能夠僅僅通過手寫圖片預測實際寫下的數字。現在，為了準備數據，我們需要對這些圖片做一些諸如調整大小、像素值歸一化之類的處理。

#reshaping
#this assumes our data format
#For 3D data, "channels_last" assumes (conv_dim1, conv_dim2, conv_dim3, channels) while
#"channels_first" assumes (channels, conv_dim1, conv_dim2, conv_dim3).
if k.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
#more reshaping
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape) #X_train shape: (60000, 28, 28, 1)

對圖片數據做了必要的處理之后，需要將 y_train 和 y_test 標簽數據進行轉換，轉換成分類的格式。例如，模型構建時，3 應該被轉換成向量 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]。

import keras
#set number of categories
num_category = 10
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_category)
y_test = keras.utils.to_categorical(y_test, num_category)

創建模型并編譯

數據加載進模型之后，我們需要定義模型結構，并通過優化函數、損失函數和性能指標。

接下來定義的架構為 2 個卷積層，分別在每個卷積層后接續一個池化層，一個全連接層和一個 softmax 層。在每一層卷積層上都會使用多個濾波器來提取不同類型的特征。直觀的解釋的話，第一個濾波器有助于檢測圖片中的直線，第二個濾波器有助于檢測圖片中的圓形，等等。關于每一層技術實現的解釋，將會在后續的帖子中進行講解。如果想要更好的理解每一層的含義，可以參考 http://cs231n.github.io/convolutional-networks/

在最大池化和全連接層之后，在我們的模型中引入 dropout 來進行正則化，用以消除模型的過擬合問題。

##model building
model = Sequential()
#convolutional layer with rectified linear unit activation
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
#32 convolution filters used each of size 3x3
#again
model.add(Conv2D(64, (3, 3), activation='relu'))
#64 convolution filters used each of size 3x3
#choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
#randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
#flatten since too many dimensions, we only want a classification output
model.add(Flatten())
#fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
#one more dropout for convergence' sake :)
model.add(Dropout(0.5))
#output a softmax to squash the matrix into output probabilities
model.add(Dense(num_category, activation='softmax'))

確定模型架構之后，模型需要進行編譯。由于這是多類別的分類問題，因此我們需要使用 categorical_crossentropy 作為損失函數。由于所有的標簽都帶有相似的權重，我們更喜歡使用精確度作為性能指標。AdaDelta 是一個很常用的梯度下降方法。我們使用這個方法來優化模型參數。

#Adaptive learning rate (adaDelta) is a popular form of gradient descent rivaled only by adam and adagrad
#categorical ce since we have multiple classes (10)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

訓練模型并評估

在定義模型架構和編譯模型之后，要使用訓練集去訓練模型，使得模型可以識別手寫數字。這里，我們將使用 X_train 和 y_train 來擬合模型。

batch_size = 128
num_epoch = 10
#model training
model_log = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=num_epoch,
verbose=1,
validation_data=(X_test, y_test))

其中，一個 epoch 表示一次全量訓練樣例的前向和后向傳播。batch_size 就是在一次前向／后向傳播過程用到的訓練樣例的數量。訓練輸出結果如下：

用簡單的 2D CNN 進行 MNIST 數字識別

現在，我們來評估訓練得到模型的性能。

score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0]) #Test loss: 0.0296396646054
print('Test accuracy:', score[1]) #Test accuracy: 0.9904

測試準確率達到了 99%+，這意味著這個預測模型訓練的很成功。如果查看整個訓練日志，就會發現隨著 epoch 的次數的增多，模型在訓練數據和測試數據上的損失和準確率逐漸收斂，最終趨于穩定。

import os
# plotting the metrics
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(model_log.history['acc'])
plt.plot(model_log.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')
plt.subplot(2,1,2)
plt.plot(model_log.history['loss'])
plt.plot(model_log.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.tight_layout()
fig

用簡單的 2D CNN 進行 MNIST 數字識別

將模型存盤以便下次使用

現在需要將訓練過的模型進行序列化。模型的架構或者結構保存在 json 文件，權重保存在 hdf5 文件。

#Save the model
# serialize model to JSON
model_digit_json = model.to_json()
with open("model_digit.json", "w") as json_file:
json_file.write(model_digit_json)
# serialize weights to HDF5
model.save_weights("model_digit.h5")
print("Saved model to disk")

模型被保存后，可以被重用，也可以很方便地移植到其它環境中使用。在以后的帖子中，我們將會演示如何在生產環境中部署這個模型。

享受深度學習吧！