UsingLLMs - justin-pkb

How I Use LLMs For Coding ======================== I'm an intermediate user of LLMs. I don't do anything too fancy (fleets of agents managing other agents), but probably have a bit more tooling set up than the average developer. ## My Opinions on Utility I find these tools extremely useful for: - Bashing out a one-off script I need for a random task. - Learning about/getting started in a new domain. - I'm the type that can get easily nerd-sniped by the million choices you might be confronted with when trying something new. LLMs tend to help me avoid this by just making some choices for me so I can just get moving, which is often a better way to learn and experiment. - Doing a first-pass scan on a bug. - Getting over the inertia of *starting* on a project. I personally find that I'm able to think about a problem more creatively when I have a few concrete things to play with, and I can "sculpt" or "cut". I find myself having tools generate a lot of code, spend some time cutting and molding, figuring out what I really want, and then throwing it all away and starting over. - On the second pass, I often hand-write the core concepts/scaffold that I've determined I want, and then may have the LLM fill in the rest. Where I still find limitations: - I find the code generated to be much more "literal" than my aesthetic. Lots of if/else statements, lots of try/except statements, lots of code that looks like best practices (e.g. creating lots of data types), but that is way more verbose than it needs to be because it hasn't synthesized down to a smaller number of key concepts that interact together productively. - As such, I find it to be better at filling things in when your existing codebase has strong conventions/patterns to follow. When having it start "greenfield", I invariably find myself rewriting almost everything because I don't like how things fit together. - It's very possible this is a skill issue. ## What do I use? - 95% of the time, I use claude code with opus 4.5 - The remainder I use codex with 5.2 ### Global Guidance I store a global `AGENTS.md` file in my dotfiles repo, which I then symlink to `~/.claude/CLAUDE.md`. It's pretty short and contains two main things: - I structure almost every single git repository I work in in one of these ways: [[CODESTRUCT]], which I summarize and explain so the LLM broadly understands how to navigate my code. - Every single project I write has a [[CMDRUNNER-JUST|Justfile]], that provides the following lifecycle recipes: [[PROJECT#ODLC]]. ### Skills There are a variety of CLI tools that I use, including some org-internal CLIs that help provide a convenient command line interface to internal services. I generally have claude generate an initial SKILL.md scaffold for me based off of the helptext, and I then clean it up with a bit of curation and examples, and then manage it in my dotfiles repo, which gets symlinked to `~/.claude/skills`. Most of these can be discovered, but I find that the explicitness of the skills helps avoid rabbit holes of trying to use the wrong tools. - TODO: for coding agents, this feels way lighterweight and more natural than MCP. There are some interesting questions on CLI design to consider: - It may be important to have `--output=json` as a flag available on every command. - For a sufficiently complex CLI tool, you may want a skill on a per-subcommand basis. - Segregating read and write operations in a way that your agentic harness can prompt for approval. ## What do I not use? ### Hooks I used to use hooks that ran an auto-formatter, linting, etc. after every file change, but this was too granular - small changes often don't pass these checks on their own, and auto-formatting changes the file out from under the coding agent. The agent also generally runs this verification itself when it's ready for testing. ### Sub-Agents I honestly haven't gotten the mileage out of going "full YOLO", and so I haven't really figured out what I would use a sub-agent for. ## How do I use it? I (with claude's help) have implemented a set of shell functions packaged together in an entrypoint called `llmbox`. Usage looks something like this: ``` Usage: llmbox <tool> [options] -w, --worktree [branch] # Run in a worktree (select branch via fzf if omitted) llmbox clean # Interactively select and remove a worktree llmbox list # List all worktrees with their branches Examples: llmbox claude # Run in current directory llmbox claude -w # Select branch via fzf, run in worktree llmbox codex --worktree feature/foo # Run in worktree for branch llmbox clean # Select worktree to remove llmbox list # Show all worktrees ``` The rough idea is that this creates a lightweight sandbox for the LLM to execute within. - On Linux, this uses bubblewrap (inspired by [this post](https://blog.gpkb.org/posts/ai-agent-sandbox/)). - On MacOS, this uses docker to launch a container that has the LLM tools pre-installed. Most of the implementation detail involves mounting a bunch of things from the host into the sandbox: - Auth credentials from claude/codex sessions so that you don't have to re-auth in every sandbox. - Your working directory - bin directories for tools - config directories - env vars Outside of that, you can either just run it in your current directory (in which case, it will run in default mode), or you can run it in a git worktree (either creating a new branch or selecting an existing one), which it will place in a standard directory, and will then run under `--yolo` mode. When you're all done, you just clean up your sandbox and call it a day. TODO: I still supervise running under `--yolo` mode pretty carefully. The official guidance is that you shouldn't give network access, but then the agents are substantially less useful. Something I'd like to figure out is how to create a pre-approved set of "read operations" that are always allowed, while reserving certain over-the-network "write" operations to require approval. With internal CLI-based skills, I think this should be a lot easier to do than trying to figure out policies for arbitrary HTTP requests.