-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GPTQ quantization #467
Conversation
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 35 28 0 7 Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 11 102 101 0 1 Python 45 1993 1695 62 236 TOML 19 574 506 11 57 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 196 169 1 26 (Total) 273 201 32 40 ------------------------------------------------------------------------------- Markdown 25 1842 0 1385 457 |- BASH 5 101 98 0 3 |- JSON 1 12 12 0 0 |- Python 5 92 82 0 10 |- Rust 6 408 365 19 24 |- TOML 2 75 63 0 12 (Total) 2530 620 1404 506 ------------------------------------------------------------------------------- Rust 173 55983 50835 1000 4148 |- Markdown 92 864 13 801 50 (Total) 56847 50848 1801 4198 =============================================================================== Total 282 61005 53559 2458 4988 =============================================================================== |
cargo run --features cuda -- -i plain -m kaitchup/Phi-3-mini-4k-instruct-gptq-4bit -a phi3 |
Broke my heart when I went to try Mistral_Large 2-bit EQAT (AutoGPTQ) on my M1 and only then saw no Mac support 😭. Wondering when might that come around? If adding that support for MPS/Metal was not too high-level expert knowledge prereq'd of a task I wouldn't mind taking a swing at it 😂 |
@BuildBackBuehler If you could add this, it would be amazing! I haven't seen GPTQ kernels on Mac though, if you can find any it shouldn't be too hard to add it and I would appreciate it if you take a shot! |
This PR adds GPTQ quantization (paper here) support.
Refs: #418, #448.