Predicted Outputs == Speculative Decoding
This technique can greatly improve inference speed if you are making just small updates to documents or to code.
Cursor call it “Fast Apply”
OpenAI call it “Predicted Outputs”
Broadly, these are all forms of “Speculative Decoding” and can also be done using vLLM.
The techniques give you speed-up, at a small cost increase (wrong guesses need to be recalculated).
I show how to use the OpenAI API and also how to implement this with open source models using vLLM.
Cheers, Ronan