Last month, for the first time, I was able to get a Qwen model running locally as a good replacement for my scout subagent, which I use for initial codebase exploration to avoid context bloat.

This is amazing. I usually use GPT mini models for that kind of work, so switching to a local model should save some credits at a time when AI costs keep increasing.

If you have 20 GB of VRAM, you can try it with Ollama or LM Studio. Ask your agent how to set it up. pi already has native Ollama integration, which makes it easy to test in your workflow.