feat(remote_model): support variable remote backend for model loader #3964

DellCurry · 2025-02-28T13:45:47Z

Motivation

Similar as what I do in vllm support variable remote backend

Modifications

Background

Currently, one of the most general ways to load model is loading from local disk, which means user must firstly download model files from HF or cloud storage to local. Obviously it would waste lots of time especially for huge models.

Of course there are some ways to load directly from remote, such as remote filesystem like NFS. Those methods also have their own drawbacks on network speed and flexibility.

Besides, some organizations hope to use KV Database such as Redis to accelerate model loading. Our team has implemented a RDMA-based KV database which is much faster as following:

What this PR do

In order to provide more flexibility, I add a new ModelLoader class named RemoteModelLoader, and introduce a new module named Connector. RemoteModelLoader would create an Connector as its member. RemoteModelLoader would load model first and then fetch weight tensor one by one from Connector.

Connector has two types: KV for KV-database and FS for remote file storage. Both types must implement weight_iterator() to yield weight tensors and pull_files() to download model config flies. I have implemented RedisConnector as an example for KV-Connector (most of the serde part copied from LMCache).

KV-Connector could also be used for remote prefix cache in the future as what LMCache do.

TBD

If this pr proved to be helpful, I will fix following soon:

an S3Connector for S3 compatible remote backend as an example for FS-Connector
a script to save models weights tensor to remote KV database (Noticing that ShardedStateLoader also missing this script, this two scripts are very similar, maybe one commit for both)
possible unit tests and coding styles

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Signed-off-by: wangyu <[email protected]>

feat(remote_model): support variable remote backend for model loader

83957dc

Signed-off-by: wangyu <[email protected]>

DellCurry requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 28, 2025 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(remote_model): support variable remote backend for model loader #3964

feat(remote_model): support variable remote backend for model loader #3964

DellCurry commented Feb 28, 2025 •

edited

Loading

feat(remote_model): support variable remote backend for model loader #3964

Are you sure you want to change the base?

feat(remote_model): support variable remote backend for model loader #3964

Conversation

DellCurry commented Feb 28, 2025 • edited Loading

Motivation

Modifications

Background

What this PR do

TBD

Checklist

DellCurry commented Feb 28, 2025 •

edited

Loading