35 Star 66 Fork 5

inspur-inna / inna2.0

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

inna2.0 for Deep Learning

Introduction

inna2.0 is developed by AI & HPC Application R&D team of Inspur. It is a parallel version for multi-node GPU cluster, which is designed based on the NVIDIA/Caffe forked from the BVLC/caffe ( https://github.com/NVIDIA/caffe, more details please visit http://caffe.berkeleyvision.org).

Features

(1) The design basics

The inna2.0 is designed for high density GPU clusters; The new version supports InfiniBand (IB) high speed network connection and shared storage system that can be equipped by distributed file system, like NFS and GlusterFS. The training dataset is read in parallel for each process. The hierarchical communication mechanisms were developed to minimize the bandwidth requirements between computing nodes.

Updates of the inna2.0

Support NCCL 2.0

Both inter and intra node GPU communication are managed by NCCL with GPU direct RDMA.

(2) High performance and high scalability

The AlexNet, GoogLeNet and ResNet model have been tested with inna2.0 on a GPU cluster, which includes 4 nodes, and each of which has 4 P40 GPUs. The dataset is ImageNet. The speedup is 14.65X, 14.25X, 15.34X, for AlexNet (batchsize=1024), GoogLeNet (batchsize=128) and ResNet (batchsize=32) respectively on 4 nodes with 16 GPUs.

(3) Good inheritance and easy-using

inna2.0 retains all the features of the original Caffe architecture, namely the pure C++/CUDA architecture, support of the command line, Python interfaces, and various programming methods. As a result, the cluster version of the Caffe framework is user-friendly, fast, modularized and open, and gives users the optimal application experience.

Try your first inna2.0

This program can run 1 processes at least.

cifar10

  1. Run data/cifar10/get_cifar10.sh to get cifar10 data.
  2. Run examples/cifar10/create_cifar10.sh to conver raw data to leveldb format.
  3. Run train_quick.sh to train the net.
  4. Example of train_quick.sh script. run -host node1,node2 -mca btl_openib_want_cuda_gdr 1 --mca io ompio -np 2 -npernode 1 ./build/tools/caffe train --solver=examples/cifar10/cifar10_quick_solver.prototxt --gpu=0,1,2,3

Reference

  • More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

  • Deep Image: Scaling up Image Recognition

Ask Questions

  • For reporting bugs, please use the inna2.0/issues page or send email to us.

  • Email address: wush@inspur.com

Author

  • Shaohua Wu.

空文件

简介

暂无描述 展开 收起
C++
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
C++
1
https://gitee.com/inspur-inna/inna2.0.git
git@gitee.com:inspur-inna/inna2.0.git
inspur-inna
inna2.0
inna2.0
master

搜索帮助

53164aa7 5694891 3bd8fe86 5694891