Quickstart
goalseek works best when each research project is its own audit boundary. The manifest, baseline files, logs, run artifacts, and git history all live inside the project directory.
Prerequisites
- Python 3.11 or newer
git- one provider CLI available on
PATHsuch ascodex,claude,gemini, oropencode
The loop creates candidate commits and may revert them. Start from a clean working tree before you run research iterations.
Install the package
Create a pyproject.toml for the workspace where you want to run goalseek, and point uv at the latest wheel published in the repository dist/ folder:
[project]
name = "goalseek-runner"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"goalseek @ https://github.com/shambhu112/goalseek/raw/main/dist/goalseek-0.1.1-py3-none-any.whl",
]
Then create the environment and sync it:
uv venv .venv
uv sync
Verify the CLI:
uv run goalseek --help
Create a project
Choose a provider and model, then scaffold a new project.
- Codex
- Claude Code
- Gemini
- OpenCode
uv run goalseek project init demo --provider codex --model gpt-5.4-mini
uv run goalseek project init demo --provider claude_code --model claude-haiku-4-5-20251001
uv run goalseek project init demo --provider gemini --model gemini-2.5-pro
uv run goalseek project init demo --provider opencode --model gpt-5-codex
The scaffold includes:
demo/
.git/
manifest.yaml
program.md
setup.py
validate_results.py
experiment.py
config/project.yaml
context/
data/
hidden/
logs/
runs/
Check the manifest
Open demo/manifest.yaml and confirm the core files have the right modes:
files:
- path: manifest.yaml
mode: read_only
- path: program.md
mode: read_only
- path: setup.py
mode: read_only
- path: validate_results.py
mode: hidden
- path: experiment.py
mode: writable
- path: hidden/**
mode: hidden
- path: config/**
mode: read_only
- path: runs/**
mode: generated
- path: logs/**
mode: generated
Validate the manifest:
uv run goalseek manifest validate ./demo
Then inspect demo/config/project.yaml and confirm the hypothesis provider, implementation provider, model names, timeouts, and logging settings match the run you want.
Prepare baseline files
Before baseline, make sure these three files represent a runnable first version of the project:
| File | Purpose |
|---|---|
experiment.py | First implementation the verifier can train or run. It is writable and may be changed by later iterations. |
program.md | Reusable read-only instructions for the planning and implementation providers. It is visible during READ_CONTEXT, PLAN, and APPLY_CHANGE. |
validate_results.py | Hidden verification harness. It is not provider context. It runs during VERIFY when referenced by manifest verification commands. |
The default manifest usually runs:
verification:
commands:
- name: train
run: python3 experiment.py
cwd: .
timeout_sec: 1200
- name: evaluate
run: python3 validate_results.py --evaluate --output runs/latest/results.json
cwd: .
timeout_sec: 800
Make sure the metric extractor points at the verifier output:
metric:
name: score
direction: maximize
extractor:
type: json_file
path: runs/latest/results.json
json_pointer: /metric
Optional: import sample research assets
This repo includes a small Kaggle-style demo package.
./move-testpackage.sh --overwrite ./demo
After importing, re-check manifest.yaml, program.md, experiment.py, and validate_results.py.
Run the lifecycle
Prepare the project:
uv run goalseek setup ./demo
Commit local scaffold and setup changes before agent-driven edits:
uv run goalseek gittreeclean --message "clean repo" ./demo
Capture the baseline metric. Baseline runs verification on the current project without asking a provider to edit code.
uv run goalseek baseline ./demo
Run a few full iterations:
uv run goalseek run ./demo --iterations 3
Inspect status and summary:
uv run goalseek status ./demo
uv run goalseek summary ./demo
What to expect on disk
runs/0000_baseline/stores baseline artifacts.runs/0001/,runs/0002/, and later directories store iteration-specific prompts, plans, logs, and result records.logs/state.jsonstores resumable loop state.logs/results.jsonlstores append-only result summaries.
Open runs/0001/prompt.md, runs/0001/provider_output.md, and runs/0001/result.json after your first iteration. Those files make the system much easier to reason about.
What happens during a run
Baseline:
- discovers the project root
- validates
manifest.yaml - loads effective config
- runs verification commands
- extracts the metric
- writes
runs/0000_baseline/and initializeslogs/state.json
Each later iteration:
READ_CONTEXT -> PLAN -> APPLY_CHANGE -> COMMIT -> VERIFY -> DECIDE -> LOG
The loop resumes from logs/state.json if a previous run stopped between phases.
Common failure modes
- Missing provider CLI executable:
install the matching provider tool and make sure it is available on
PATH. - Dirty working tree:
commit or restore local changes before
run. - Manifest issues:
re-run
goalseek manifest validateand check path scopes plus metric extraction rules. - Verification failures:
inspect
runs/<iteration>/verifier.log.