****https://research.tue.nl/en/publications/language-models-speed-up-local-search-for-finding-programmatic-po
TL;DR
- We can find good policies (in code-space) using Stochastic Hill-Climbing (SHC) (which in turn uses) LLMs to take those steps.
- LLMs benefit from feedback, i.e, roll-out of the policy it generated in the previous iteration.
- (Note that unlike our method and like Levi’s most works, they use a DSL and a AST tree to make edits to code)
Methodology

