🚧 LLM Speed Benchmark (LLMSB) is currently in beta (v0). Please do not use this in production, or use it at your own risk. We're still ironing out some kinks and improving functionality. If you encounter any bugs or have suggestions, kindly report them under ISSUES. Your feedback is invaluable!
LLM Speed Benchmark (LLMSB) is a benchmarking tool for assessing LLM models' performance across different hardware platforms. Its ultimate goal is to compile a comprehensive dataset detailing LLM models' performance on various systems, enabling users to more effectively choose the right LLM model(s) for their projects.
LLMSB is on v0, so it has limitations:
LLMSB was ran/test on a L40 and H100 GPU though RunPod. In those benchmarks the models llama-2-7b-hf, codellama-13b-oasst-sft-v10, & mpt-7b where tested.
Checkout the results HERE. If any errors/issues are noticed, please repport them to ISSUES.
Create and activate python environment:
python3 -m venv env
source env/bin/activate
Install package dependencies (using APT):
apt -y update
apt install -y vim
apt install -y neofetch
Install python dependencies:
pip3 install transformers
pip3 install psutil
pip3 install gputil
pip3 install tabulate
pip3 install sentencepiece
pip3 install protobuf
Install Pytorch (to determine how to install Pytorch for your system, checkout their tool on: https://pytorch.org/):
# install pytorch stable build, for linux, using CUDA 12.1:
pip3 install torch torchvision torchaudio
Install LLM-VM:
pip install llm-vm
(optional) If you are using models like LLAMA, you will need a HuggingFace access token. Setup your access token HERE then save your token to your console by running the following command:
huggingface-cli login
Complete the steps listed in the Setup section.
To configure your set, you need to create a json file with the following parameters (here is an example):
{
"model": "bigscience/bloom-560m", # the model's path/repo on HuggingFace (https://huggingface.co/models)
"prompt": "Hello World!", # the prompt you want to input into the LLM model
"device": "cuda:0", # the device you want to run the LLM model on (GPU/CPU)
"max_length": 50, # the maximun length of the generated tokens
"temperature": 0.9, # temperatue value for the LLM model
"top_k": 50, # top-k value for the LLM model
"top_p": 0.9, # top-p value for the LLM model
"num_return_sequences": 1, # the number of independently ran instances of the model
"time_delay": 0, # the time delay (seconds) the metrics-collecter will wait per interation
"model_start_pause": 1, # the time (seconds) the test will wait BEFORE running the LLM model
"model_end_pause": 1 # the time (seconds) the test will wait AFTER the LLM model is done running,
"framework": "llm-vm" # the name of the framework/library you want to use to run the model
}
Using the path to the config file you create in the previous step, run the following to start the benchmark (pick one option):
# run one benchmark
python3 run.py --config ./configs/llmvm_test.json
# run more then one benchmark (in this case 3)
python3 run.py --config ./configs/llmvm_test.json --loops 3
After the benchmark is done running, check out the final results in a file that should look something like this:
report_2023-11-25_05:55:04.207515_utc_1ffc4fa7-3aa9-4878-b874-1ff445e1ff8a.json
Setup RunPod, setup your ssh cert/key, and get a pod running. You can access your pod(s) here: https://www.runpod.io/console/pods
Click the "Connect" button to get the ssh connection info. This info should look something like this:
ssh root&12.345.678.90 -p 12345 -i ~/.ssh/id_example
ssh <user>@<ip-address> -p <port> -i <local-path-to-ssh-cert>
Using the command in step #2, you should be able to ssh into the pod and use the GPU you selected in that RunPod pod.
If you want to copy a file from the pod to your local machine, you would run command in this format (this is refering to the variables shown in step #2):
scp -P <port> -i <local-path-to-ssh-cert> <user>@<ip-address>:<path-to-file-in-pod> <path-to-local-directory>
scp -P 12345 -i ~/.ssh/id_example <user>@<ip-address>:/root/test.txt /home/user1/Downloads/
After you are done with the pod, shut it down or pause it. But warning, if you pause it you will still get charged, just way less.
Great datasets of prompts (if you can't come up with any):
Learn more about LLM parameters: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
Great benchmark to benchmark cloud-based LLM models: https://github.com/ray-project/llmperf
Cool LLM intelligence leadboards:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。