加载中…

加载中...

基于GPU计算的深度学习案例

转载 2016-09-02 20:31:38

基于GPU计算的深度学习案例

1.环境配置

1.1 theano 安装和配置

本实验采用的theano符号计算库作为深度学习后端计算引擎,亦可基于tensorflow。(1)github 下载theano源代码,https://github.com/Theano/Theano

(2)安装 python setup.py install(3)gpu安装好后,cuda环境变量配置,编辑.bashrc文件,

export PATH=$PATH:/usr/local/cuda-6.5/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

(4)配置theano文件,在home下新建.theanorc文件,配置内容如下

[global]

floatX = float32

device = gpu

optimizer = fast_run

[cuda]

root = /usr/local/cuda-6.5/

[lib]

cnmem = 0.9

[nvcc]

fastmath = True

[blas]

ldflags = -llapack -lblas

(5) keras 安装下载keras深度学习框架,https://github.com/fchollet/keras

安装参考文档,https://keras.io/#installation

或者通过源代码安装,python setup.py install

2. 实验代码

keras提供了实际案例,用于测试deep learning模型。本实验选择kaggle-otto的数据,进行模型测试。下载数据,https://www.kaggle.com/c/otto-group-product-classification-challenge/data

93个变量,多分类问题(9个类别)代码来自keras example

from __future__ import print_function

import numpy as np

import pandas as pd

np.random.seed(1337)  # for reproducibility 

 

from keras.models import Sequential

from keras.layers.core import Dense, Dropout, Activation

from keras.layers.normalization import BatchNormalization

from keras.layers.advanced_activations import PReLU

from keras.utils import np_utils, generic_utils

 

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import StandardScaler

import timeit

#加载数据 

def load_data(path, train=True):

    df = pd.read_csv(path)

    X = df.values.copy()

    if train:

        np.random.shuffle(X)  # https://youtu.be/uyUXoap67N8 

        X, labels = X[:, 1:-1].astype(np.float32)X[:, -1]

        return X, labels

    else:

        X, ids = X[:, 1:].astype(np.float32)X[:, 0].astype(str)

        return X, ids

 

#数据标准化 

def preprocess_data(X, scaler=None):

    if not scaler:

        scaler = StandardScaler()

        scaler.fit(X)

    X = scaler.transform(X)

    return X, scaler

 

#目标变量转换 

def preprocess_labels(labels, encoder=None, categorical=True):

    if not encoder:

        encoder = LabelEncoder()

        encoder.fit(labels)

    y = encoder.transform(labels).astype(np.int32)

    if categorical:

        y = np_utils.to_categorical(y)

    return y, encoder

 

 

def make_submission(y_prob, ids, encoder, fname):

    with open(fname, 'w') as f:

        f.write('id,')

        f.write(','.join([str(i) for i in encoder.classes_]))

        f.write('\n')

        for i, probs in zip(ids, y_prob):

            probas = ','.join([i] + [str(p) for p in probs.tolist()])

            f.write(probas)

            f.write('\n')

    print('Wrote submission to file {}.'.format(fname))

 

print('Loading data...')

X, labels = load_data('train.csv'train=True)

X, scaler = preprocess_data(X)

y, encoder = preprocess_labels(labels)

 

X_test, ids = load_data('test.csv'train=False)

X_test, _ = preprocess_data(X_test, scaler)

 

nb_classes = y.shape[1]

print(nb_classes, 'classes')

 

dims = X.shape[1]

print(dims, 'dims')

 

tic = timeit.default_timer()

print('Building model...')

#构建深度学习网络DNN 

 

model = Sequential()

model.add(Dense(512input_shape=(dims,)))

model.add(PReLU())

model.add(BatchNormalization())

model.add(Dropout(0.5))

 

model.add(Dense(512))

model.add(PReLU())

model.add(BatchNormalization())

model.add(Dropout(0.5))

 

model.add(Dense(512))

model.add(PReLU())

model.add(BatchNormalization())

model.add(Dropout(0.5))

 

model.add(Dense(nb_classes))

model.add(Activation('softmax'))

 

model.compile(loss='categorical_crossentropy', optimizer='adam')

 

print('Training model...')

model.fit(X, y, nb_epoch=20, batch_size=128, validation_split=0.15)

toc = timeit.default_timer()

print("training time", toc - tic)

 

print('Generating submission...')

proba = model.predict_proba(X_test)

make_submission(proba, ids, encoder, fname='keras-otto.csv')

gpu使用情况查看watch -n 10 nvidia-smi

3. 运行情况

3.1 运行时间对比

基于cpu计算,training time 433.890647888

基于gpu计算,training time 44.1580460072

从计算时间上看,其他条件一样的情况下,基于gpu要比cpu快约10倍

Using Theano backend.Using gpu device 0: Tesla K20m (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)/home/anaconda2/lib/python2.7/site-packages/Theano-0.9.0.dev2-py2.7.egg/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module."downsample module has been moved to the theano.tensor.signal.pool module.")Loading data...9 classes93 dimsBuilding model...Training model...Train on 52596 samples, validate on 9282 samplesEpoch 1/2052596/52596 [==============================] - 1s - loss: 0.8993 - val_loss: 0.6474Epoch 2/2052596/52596 [==============================] - 1s - loss: 0.6390 - val_loss: 0.5932Epoch 3/2052596/52596 [==============================] - 1s - loss: 0.6027 - val_loss: 0.5646Epoch 4/2052596/52596 [==============================] - 1s - loss: 0.5777 - val_loss: 0.5466Epoch 5/2052596/52596 [==============================] - 1s - loss: 0.5625 - val_loss: 0.5400Epoch 6/2052596/52596 [==============================] - 1s - loss: 0.5484 - val_loss: 0.5319Epoch 7/2052596/52596 [==============================] - 1s - loss: 0.5401 - val_loss: 0.5173Epoch 8/2052596/52596 [==============================] - 1s - loss: 0.5298 - val_loss: 0.5137Epoch 9/2052596/52596 [==============================] - 1s - loss: 0.5209 - val_loss: 0.5050Epoch 10/2052596/52596 [==============================] - 1s - loss: 0.5168 - val_loss: 0.5048Epoch 11/2052596/52596 [==============================] - 1s - loss: 0.5064 - val_loss: 0.5078Epoch 12/2052596/52596 [==============================] - 1s - loss: 0.5014 - val_loss: 0.5012Epoch 13/2052596/52596 [==============================] - 1s - loss: 0.4966 - val_loss: 0.4962Epoch 14/2052596/52596 [==============================] - 1s - loss: 0.4869 - val_loss: 0.4963Epoch 15/2052596/52596 [==============================] - 1s - loss: 0.4841 - val_loss: 0.5001Epoch 16/2052596/52596 [==============================] - 1s - loss: 0.4819 - val_loss: 0.4995Epoch 17/2052596/52596 [==============================] - 1s - loss: 0.4740 - val_loss: 0.4892Epoch 18/2052596/52596 [==============================] - 1s - loss: 0.4728 - val_loss: 0.5007Epoch 19/2052596/52596 [==============================] - 1s - loss: 0.4630 - val_loss: 0.4974Epoch 20/2052596/52596 [==============================] - 1s - loss: 0.4656 - val_loss: 0.4943training time 44.1580460072Generating submission...144368/144368 [==============================] - 0sWrote submission to file keras-otto.csv.

4.总结

(1)基于theano配置多gpu比较复杂,tensorflow可以方便设置多gpu

(2)gpu计算更快

(3)keras构建深度学习网络方便


阅读(0) 评论(0) 收藏(0) 转载(0) 举报
分享

评论

重要提示:警惕虚假中奖信息
0条评论展开
相关阅读
加载中,请稍后
bicloud
  • 博客等级:
  • 博客积分:0
  • 博客访问:310,786
  • 关注人气:0
  • 荣誉徽章:

相关博文

新浪BLOG意见反馈留言板 不良信息反馈 电话:4006900000 提示音后按1键(按当地市话标准计费) 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

新浪公司 版权所有