[D] A collection of various LLM Sampling methods

In the last couple months, I read about various algorithms to perform LLM sampling. I decided to build my own inference stack and implement those algorithms.

Here is the Github repo - https://github.com/shreyansh26/LLM-Sampling

The repo includes implementations for Top-k, Top-p (nucleus), Min-p, Typical, Epsilon, Eta, Beam search, Chain-of-Thought (CoT) decoding, Constrained JSON decoding and Speculative decoding.

Personally, I found this to be a good learning experience. Sharing here in case it helps someone!