Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(remote_model): support variable remote backend for model loader #3964

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DellCurry
Copy link

@DellCurry DellCurry commented Feb 28, 2025

Motivation

Similar as what I do in vllm support variable remote backend

Modifications

Background

Currently, one of the most general ways to load model is loading from local disk, which means user must firstly download model files from HF or cloud storage to local. Obviously it would waste lots of time especially for huge models.

Of course there are some ways to load directly from remote, such as remote filesystem like NFS. Those methods also have their own drawbacks on network speed and flexibility.

Besides, some organizations hope to use KV Database such as Redis to accelerate model loading. Our team has implemented a RDMA-based KV database which is much faster as following:
image

What this PR do

In order to provide more flexibility, I add a new ModelLoader class named RemoteModelLoader, and introduce a new module named Connector. RemoteModelLoader would create an Connector as its member. RemoteModelLoader would load model first and then fetch weight tensor one by one from Connector.

Connector has two types: KV for KV-database and FS for remote file storage. Both types must implement weight_iterator() to yield weight tensors and pull_files() to download model config flies. I have implemented RedisConnector as an example for KV-Connector (most of the serde part copied from LMCache).

KV-Connector could also be used for remote prefix cache in the future as what LMCache do.

TBD

If this pr proved to be helpful, I will fix following soon:

  • an S3Connector for S3 compatible remote backend as an example for FS-Connector
  • a script to save models weights tensor to remote KV database (Noticing that ShardedStateLoader also missing this script, this two scripts are very similar, maybe one commit for both)
  • possible unit tests and coding styles

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant