NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
KVarN: Native vLLM backend for KV-cache quantization by Huawei (github.com)
throwa356262 6 days ago [-]
Better performance than TQ and better quality than FP16?

Am I reading this right??

qeternity 6 days ago [-]
It's not better quality: 59.3% vs 59.4% fp16 on AIME 25
sheepscreek 6 days ago [-]
0.1% is within margin of error. Depending on the performance boost, it might be worthwhile taking a minuscule quality hit.
qeternity 4 days ago [-]
I think it very much is worth it!

But the point was that quality didn't magically increase.

electroglyph 6 days ago [-]
any divergence (even if the benchmark is better) from full precision is error
7e 6 days ago [-]
Just pretend that it is the next step update when training. You didn’t train your model to step=inf, I hope?
thefox96 6 days ago [-]
Faster than Fp16, not better quality i guess
pbich 6 days ago [-]
[dead]
v3ss0n 6 days ago [-]
Why this is not a PR for vLLM ?
esafak 6 days ago [-]
It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.

edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.

jmalicki 6 days ago [-]
And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.
woadwarrior01 6 days ago [-]
Last I heard, vLLM was backed by a company that has raised $150m in seed funding. I'm sure they've got the resources to port it.
electronsoup 6 days ago [-]
Why this is not a PR for llama.cpp
thefox96 6 days ago [-]
it should be easy to do btw
lukasc-ch 5 days ago [-]
... and it's on llama.cpp that to this guy! https://www.reddit.com/r/LocalLLaMA/comments/1txlhxu/i_imple...
lukasc-ch 5 days ago [-]
This is awesome! Let's give them some stars: - https://github.com/huawei-csl/KVarN (original repo, vLLM implementation) - https://github.com/Anbeeld/beellama.cpp (llama.cpp implementation + awesome evals)
mikeayles 5 days ago [-]
[dead]
sspoisk 5 days ago [-]
[flagged]
shockembopper 6 days ago [-]
[dead]
0xjeffro 6 days ago [-]
yao yao ling xian
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:27:33 GMT+0000 (Coordinated Universal Time) with Vercel.