inna2.0 is developed by AI & HPC Application R&D team of Inspur. It is a parallel version for multi-node GPU cluster, which is designed based on the NVIDIA/Caffe forked from the BVLC/caffe ( https://github.com/NVIDIA/caffe, more details please visit http://caffe.berkeleyvision.org).
The inna2.0 is designed for high density GPU clusters; The new version supports InfiniBand (IB) high speed network connection and shared storage system that can be equipped by distributed file system, like NFS and GlusterFS. The training dataset is read in parallel for each process. The hierarchical communication mechanisms were developed to minimize the bandwidth requirements between computing nodes.
Support NCCL 2.0
Both inter and intra node GPU communication are managed by NCCL with GPU direct RDMA.
The AlexNet, GoogLeNet and ResNet model have been tested with inna2.0 on a GPU cluster, which includes 4 nodes, and each of which has 4 P40 GPUs. The dataset is ImageNet. The speedup is 14.65X, 14.25X, 15.34X, for AlexNet (batchsize=1024), GoogLeNet (batchsize=128) and ResNet (batchsize=32) respectively on 4 nodes with 16 GPUs.
inna2.0 retains all the features of the original Caffe architecture, namely the pure C++/CUDA architecture, support of the command line, Python interfaces, and various programming methods. As a result, the cluster version of the Caffe framework is user-friendly, fast, modularized and open, and gives users the optimal application experience.
This program can run 1 processes at least.
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
Deep Image: Scaling up Image Recognition
For reporting bugs, please use the inna2.0/issues page or send email to us.
Email address: wush@inspur.com
Shaohua Wu.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。