OpenAI Predicted Outputs

Nov 06, 2024

Predicted Outputs == Speculative Decoding

This technique can greatly improve inference speed if you are making just small updates to documents or to code.

Broadly, these are all forms of “Speculative Decoding” and can also be done using vLLM.

The techniques give you speed-up, at a small cost increase (wrong guesses need to be recalculated).

I show how to use the OpenAI API and also how to implement this with open source models using vLLM.

Cheers, Ronan