Idea: “draft-and-verify” using smaller models to generate a head tokens.
A few techniques such as ngrams, EAGLE are supported in vLLM
a method to speed up LLM decoding.
Idea: “draft-and-verify” using smaller models to generate a head tokens.
A few techniques such as ngrams, EAGLE are supported in vLLM