Baseline and Iterations
The baseline and iteration flows turn a project directory into a measured loop with durable artifacts. Baseline measures the current project. Iterations try one focused candidate change and keep, revert, or skip it.
High-level flow
- Create a project scaffold.
- Add or import project files.
- Run
goalseek setupto inspect the project and scope. - Run
goalseek baselineto measure the current project without changing code. - Run
goalseek run --iterations Nto plan, implement, verify, and judge candidate changes.
The loop is stateful. It can pause and resume through logs/state.json.
Core files
| File | Role |
|---|---|
manifest.yaml | Defines file scope, verification commands, and metric extraction. |
experiment.py | Default writable implementation file. Usually trained or run by verification. |
program.md | Read-only project instructions visible to providers during planning and implementation. |
validate_results.py | Hidden verifier harness. It is run by VERIFY, not exposed as provider context. |
runs/ | Per-baseline and per-iteration artifacts. |
logs/results.jsonl | Append-only result history. |
logs/state.json | Resumable loop state. |
Baseline flow
Baseline establishes the first retained metric without asking a provider to modify code.
Run it with:
uv run goalseek baseline ./demo
Baseline steps
- The CLI calls
goalseek.api.run_baseline(). LoopEngine.run_baseline()discovers the project root by findingmanifest.yaml.ManifestService.validate()loads and validates scope, verification commands, and metric config.- Effective config is loaded from defaults, user config, project config, and overrides.
- Runtime logging is configured.
ArtifactStoreandRepohelpers are created.- The project is checked for a git repository.
runs/0000_baseline/is created.env.jsoncaptures OS, Python, provider, model, effective config, and command versions.- Verification commands from the manifest run in order.
- Metric extraction runs if verification succeeds.
result.jsonandlogs/results.jsonlare written.logs/state.jsonis initialized after a successful baseline.
What baseline checks
- the project root resolves to a directory containing
manifest.yaml - the manifest is structurally valid
- the project is inside a git repository
- verification commands complete successfully
- metric extraction succeeds if verification passes
What baseline writes
runs/0000_baseline/env.jsonruns/0000_baseline/verifier.logruns/0000_baseline/metrics.jsonruns/0000_baseline/result.jsonlogs/results.jsonl
After a successful baseline, logs/state.json starts at:
current_iteration = 1
current_phase = READ_CONTEXT
last_outcome = baseline
Iteration flow
Each iteration passes through the same ordered phases:
READ_CONTEXT -> PLAN -> APPLY_CHANGE -> COMMIT -> VERIFY -> DECIDE -> LOG
Run full iterations with:
uv run goalseek run ./demo --iterations 3
An iteration counts as complete only after LOG resets state back to READ_CONTEXT with an empty iteration payload.
READ_CONTEXT
- Reads git history and diff summaries.
- Enumerates visible read-only and writable files from the manifest.
- Includes
program.mdwhen it is read-only. - Excludes hidden files such as
validate_results.py. - Loads recent results and active directions.
- Updates
logs/state.json.
PLAN
- Builds a planning prompt from context, project scope, recent outcomes, and directions.
- Calls the provider plan interface.
- Writes
prompt.md,plan.md, andprovider_output.md. - Gives the provider visible context such as
program.md. - Keeps hidden paths listed as off-limits.
APPLY_CHANGE
- Confirms the git tree is clean.
- Calls the provider implementation interface.
- Checks changed files against manifest scope.
- Treats out-of-scope edits as a failure condition.
- Allows changes only in writable or generated scope.
- Jumps to
LOGwithskipped_no_changeif no files changed.
COMMIT
- Stages changed files.
- Creates a candidate commit with the plan title.
- Records parent commit and changed line count.
VERIFY
- Runs verification commands from the manifest.
- Runs
validate_results.pyhere when the manifest command references it. - Captures combined output in
verifier.log. - Extracts the scalar metric if verification succeeds.
- Jumps to
LOGwithskipped_verification_crashif a verification command fails.
DECIDE
- Compares the candidate metric against the retained metric.
- Prefers better outcomes according to the metric direction.
- Uses
git revertfor rejected changes instead of rewriting history. - Applies
min_passandmax_passthresholds before comparing to retained best. - Uses changed LOC as the tie-breaker when metrics are equal within epsilon.
LOG
- Writes final iteration artifacts and a result record.
- Appends to
logs/results.jsonl. - Advances the resumable state to the next iteration.
- Rewrites
runs/latest/history.json.
The manifest is not documentation only. It is used to decide which files are visible, writable, generated, or hidden, and out-of-scope changes can be rolled back.
Common outcomes
| Outcome | Meaning |
|---|---|
kept | Candidate met thresholds and beat the retained result, or tied with fewer changed lines. |
reverted_worse_metric | Candidate verified but did not beat the retained result. |
reverted_threshold_failure | Candidate failed configured metric thresholds. |
reverted_scope_violation | Provider changed out-of-scope files. |
skipped_no_change | Provider made no file changes. |
skipped_provider_failure | Planning or implementation provider failed. |
skipped_verification_crash | Verification failed before a usable metric was produced. |
Useful inspection commands
cat ./demo/logs/state.json
tail -n 10 ./demo/logs/results.jsonl
ls ./demo/runs/0001
cat ./demo/runs/0001/result.json
When things go wrong
- If verification fails, inspect
runs/<iteration>/verifier.log. - If the working tree is dirty, run
goalseek gittreeclean. - If the loop stalls, check the recent plans and provider output before widening the project scope or changing directions.