feat(remote_model): support variable remote backend for model loader #3964
+595
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Similar as what I do in vllm support variable remote backend
Modifications
Background
Currently, one of the most general ways to load model is loading from local disk, which means user must firstly download model files from HF or cloud storage to local. Obviously it would waste lots of time especially for huge models.
Of course there are some ways to load directly from remote, such as remote filesystem like NFS. Those methods also have their own drawbacks on network speed and flexibility.
Besides, some organizations hope to use KV Database such as

Redis
to accelerate model loading. Our team has implemented a RDMA-based KV database which is much faster as following:What this PR do
In order to provide more flexibility, I add a new
ModelLoader
class namedRemoteModelLoader
, and introduce a new module namedConnector
.RemoteModelLoader
would create an Connector as its member.RemoteModelLoader
would load model first and then fetch weight tensor one by one fromConnector
.Connector
has two types:KV
for KV-database andFS
for remote file storage. Both types must implementweight_iterator()
to yield weight tensors andpull_files()
to download model config flies. I have implementedRedisConnector
as an example forKV-Connector
(most of theserde
part copied from LMCache).KV-Connector
could also be used for remote prefix cache in the future as whatLMCache
do.TBD
If this pr proved to be helpful, I will fix following soon:
S3Connector
forS3
compatible remote backend as an example forFS-Connector
ShardedStateLoader
also missing this script, this two scripts are very similar, maybe onecommit
for both)Checklist