If you have an AMD video card and want to try local inference, check out Lemonade by AMD. It is another wrapper around llama.cpp, but the nice part is that it is tailored for AMD hardware.
It makes it easy to get a local inference endpoint without complex terminal commands. I recently tested the new Gemma model from Google with it and got an OpenAI-compatible endpoint working quickly.
Local AI keeps getting better every day. I really believe we will not need state-of-the-art models for most tasks in the near future.