

Kind of knew that after Claude plays pokemon went semi viral, it was going to immediately get goodhart’d. i also saw the usual doomers be like BY END OF YEAR AGENTS WILL BEAT POKEMON, which I thought was a severe underestimate at the time- they were undoubtably basing their projection based off the Anthropic people who posted a little chart showing how far each version of Claude made it, waiting for pokemon playing skill to emerge from larger and larger models, instead of thinking, hmm they are iteratively refining the customized tools as it gets stuck. Then after Gemini ‘beat’ the game I read a disappointed response from an RL guy that said after trying to replicate the results, they concluded Googe’s set up was basically 90% harness for the model, 10% model despite the Google team basically implying it was raw pixels-to-action.
Ackshually, my metric gives 0 measure to ASI minds and 1 measure to meat sac minds, therefore mu({bio bois}) >> mu({ASI})