Implement GPTQ quantization #467

EricLBuehler · 2024-06-23T14:29:47Z

This PR adds GPTQ quantization (paper here) support.

github-actions · 2024-06-23T14:30:45Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   11          102          101            0            1
 Python                 45         1993         1695           62          236
 TOML                   19          574          506           11           57
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               25         1842            0         1385          457
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 6          408          365           19           24
 |- TOML                 2           75           63            0           12
 (Total)                           2530          620         1404          506
-------------------------------------------------------------------------------
 Rust                  173        55983        50835         1000         4148
 |- Markdown            92          864           13          801           50
 (Total)                          56847        50848         1801         4198
===============================================================================
 Total                 282        61005        53559         2458         4988
===============================================================================

EricLBuehler · 2024-08-09T09:53:28Z

cargo run --features cuda -- -i plain -m kaitchup/Phi-3-mini-4k-instruct-gptq-4bit -a phi3

BuildBackBuehler · 2024-08-19T05:35:18Z

Broke my heart when I went to try Mistral_Large 2-bit EQAT (AutoGPTQ) on my M1 and only then saw no Mac support 😭. Wondering when might that come around? If adding that support for MPS/Metal was not too high-level expert knowledge prereq'd of a task I wouldn't mind taking a swing at it 😂

EricLBuehler · 2024-08-31T01:38:55Z

@BuildBackBuehler If you could add this, it would be amazing!

I haven't seen GPTQ kernels on Mac though, if you can find any it shouldn't be too hard to add it and I would appreciate it if you take a shot!

EricLBuehler added 4 commits June 19, 2024 13:21

Add the kernels

3c313e2

Remove include of aten or torch

a908822

Merge branch 'master' into gptq

c0e0492

Merge branch 'master' into gptq

24b32ae

EricLBuehler added 3 commits June 24, 2024 04:45

Add the ffi bindings

a0674bc

Sketch the forward method

c988419

Merge branch 'master' into gptq

041c00f

EricLBuehler added the new feature New feature or request label Jun 24, 2024

EricLBuehler added 18 commits June 24, 2024 23:20

Handle input and output reshapes

420d497

Add some features

fd33c32

Improve compat of build.rs

93bd2dc

Fix workspace dep

0f718a5

Merge branch 'master' into gptq

821ca1e

Merge branch 'master' into gptq

adfbf4b

Merge branch 'master' into gptq

636523d

Merge branch 'master' into gptq

6316412

Merge branch 'master' into gptq

7fcee44

Merge branch 'master' into gptq

d5b1b28

Merge branch 'master' into gptq

c4cd676

Finish merge

865dfe1

Fixes

942d971

Finish gptq gemm and add trait

a1ae220

Merge branch 'master' into gptq

625f186

Add the cuda gptq matmul stub

95a4468

Remove default feature

4d2d743

Correct conditional comp

f300f96

sammcj mentioned this pull request Jul 4, 2024

Feature request: Exllamav2 (exl2) backend #546

Open

EricLBuehler added 2 commits July 3, 2024 20:24

Add gguf qmatmul quantized support

ab6fb3c

Implement matmul with qmethod in qllama

04dbbd6

EricLBuehler mentioned this pull request Jul 20, 2024

Any plan about KV compression algorithm like SnapKV and PyramidKV? #598

Open

EricLBuehler added 8 commits July 22, 2024 15:27

Merge branch 'master' into gptq

dcc5b1c

Complete merge

88a080e

Merge branch 'master' into gptq

ea6a6ee

Update cargo lock

37521e5

Merge branch 'master' into gptq

21015f6

Merge branch 'master' into gptq

c152035

Merge from master

1f21ef8

Integrate with new i32 type

02fdd53

EricLBuehler added the backend Backend work label Aug 8, 2024

EricLBuehler added 3 commits August 8, 2024 20:36

Fixes

72c5752

More progress

14552b0

It doesnt crash

8298a5d

EricLBuehler added 12 commits August 9, 2024 05:54

Oops

32e579f

It works!

08485db

Remove some todos

5d56f47

Testing isq support

ca7d31a

Add to all non adapter models

4efa533

Clippy

49618dd

Avoid reallocating

0ace364

Add support for gptq to adapter models

079cd05

Add docs and logging

f72f19b

Update docs

e65774c

Clippy

f5206a0

Remove a todo

61a8530

EricLBuehler merged commit 1269bd8 into master Aug 9, 2024
14 of 15 checks passed

EricLBuehler deleted the gptq branch August 9, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GPTQ quantization #467

Implement GPTQ quantization #467

EricLBuehler commented Jun 23, 2024

github-actions bot commented Jun 23, 2024 •

edited

Loading

EricLBuehler commented Aug 9, 2024

BuildBackBuehler commented Aug 19, 2024

EricLBuehler commented Aug 31, 2024

Implement GPTQ quantization #467

Implement GPTQ quantization #467

Conversation

EricLBuehler commented Jun 23, 2024

github-actions bot commented Jun 23, 2024 • edited Loading

EricLBuehler commented Aug 9, 2024

BuildBackBuehler commented Aug 19, 2024

EricLBuehler commented Aug 31, 2024

github-actions bot commented Jun 23, 2024 •

edited

Loading