Discussion about this post

User's avatar
Neural Foundry's avatar

Impressive work on the recursive nanochat implementation. The 50% parameter reduction while matching performance is huge for edge deployments. I tried somethign similar with loop transformers a few months back but couldn't get the kv cache strategy right. The idea of discarding earlier recursion states and only keeping the latest one is clever, way simpler than what i was attempting.

1 more comment...

No posts

Ready for more?