06 / testing

Testing is part of setup.

After behavior is created, don't consider it done. Test with small tasks, correct, then turn those corrections into permanent rules.

Many people write SOUL.md and immediately expect the agent to run perfectly. In reality, newly written behavior almost always needs adjustment. Testing isn't a sign of failure — it's part of the process.

Three testing steps

Small tasks first

Start with read-only or low-risk actions: check channels, read repos, summarize emails, check balances. Don't hand over tasks involving money or public actions right away. Build trust through safe actions first.

Evaluate the results

Watch whether the agent is too passive (afraid to act), too bold (executes without confirmation), too verbose (overexplains), or misunderstands its limits. All of these are signals for correction.

Permanent corrections

Every correction you give must be saved — in SOUL.md for general rules, or in memory for specific preferences. Unsaved corrections will repeat in the next session.

What should be tested?

Language and tone

Does the agent speak the way you want? Is the register (formal/informal) correct? Is it avoiding unwanted emoji? Is it too formal or too casual?

Autonomy level

Try giving a task that should be autonomous — does the agent execute immediately or ask for permission? Then try a task that needs permission — does the agent stop or proceed anyway? Both scenarios should match expectations.

Tool selection

Does the agent pick the right tool? For example, when asked to "check balance," does it use the wallet tool or try web search? When asked to "send email," does it use the correct email?

Verification

Does the agent verify its work? After a swap, does it check the tx hash? After deploy, does it test the endpoint? After pushing code, does it check build status? Good verification prevents the agent from claiming success when it actually failed.

Error handling

What happens when something fails? Does the agent retry differently, or give up immediately? Does it report errors informatively, or just say "failed"?

Signs behavior needs fixing

Too passive

The agent always asks permission even for actions that should be autonomous. This usually happens because SOUL.md says "must ask permission" too often without explaining when it can proceed directly.

Too aggressive

The agent executes risky actions without confirmation. This happens because SOUL.md doesn't clearly explain limits, or the agent misunderstands the risk level.

Too verbose

The agent gives long explanations for simple questions. Add rules about response length in SOUL.md: "answer concisely and directly, don't ramble."

Wrong context

The agent carries context from another project or doesn't understand the current domain. This usually happens because SOUL.md is too general — add domain-specific rules.

An agent like Hermes is shaped by many small corrections: language, permission limits, tool choice, verification methods, and the habit of saving preferences. Every saved correction makes the agent more stable over time. Don't get frustrated if the first results aren't perfect — that's normal.

Hermes SOUL Guide — building a smart agent is a process, not an instant prompt.