[Bug] After using H200*8 to deploy DeepSeekR1, the large stress test model crashes #4020

wangguo1230 · 2025-03-03T09:20:44Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

If you use a stress test tool to stress test an API, an error is reported in the 500 request and the following symptoms occur:

The utilization rate of 7 graphics cards is 100%, and 1 card is idle
The model will get stuck and will no longer answer
All requests return 200 status codes in an instant

Reproduction

H200*8

docker run -d --name sglang --gpus all --shm-size 512g -p 20011:20011 -v /data1/models:/workspace --ipc=host --network=host --privileged lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path /workspace/DeepSeek-R1 --host 0.0.0.0 --served-model-name DeepSeek-R1 --context-length 32000 --tp 8 --trust-remote-code --enable-dp-attention --mem-fraction-static 0.9 --enable-flashinfer-mla --port 20011

Environment

docker image lmsysorg/sglang:latest

Fridge003 · 2025-03-04T00:54:35Z

Hi @wangguo1230, please try removing --enable-dp-attention or tuning the value of --mem-fraction-static and see whether this will help.

wangguo1230 · 2025-03-04T01:34:44Z

It's okay to remove -enable-dp-attention, but then there is no high throughput

Fridge003 · 2025-03-04T01:37:09Z

You can refer to #3956 for other optimization options. --enable-dp-attention is not a stable one.

minleminzui self-assigned this Mar 3, 2025

minleminzui added deepseek help wanted Extra attention is needed labels Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] After using H200*8 to deploy DeepSeekR1, the large stress test model crashes #4020

[Bug] After using H200*8 to deploy DeepSeekR1, the large stress test model crashes #4020

wangguo1230 commented Mar 3, 2025

Fridge003 commented Mar 4, 2025

wangguo1230 commented Mar 4, 2025

Fridge003 commented Mar 4, 2025

[Bug] After using H200*8 to deploy DeepSeekR1, the large stress test model crashes #4020

[Bug] After using H200*8 to deploy DeepSeekR1, the large stress test model crashes #4020

Comments

wangguo1230 commented Mar 3, 2025

Checklist

Describe the bug

Reproduction

Environment

Fridge003 commented Mar 4, 2025

wangguo1230 commented Mar 4, 2025

Fridge003 commented Mar 4, 2025