Quickstart

goalseek works best when each research project is its own audit boundary. The manifest, baseline files, logs, run artifacts, and git history all live inside the project directory.

Prerequisites

Python 3.11 or newer
git
one provider CLI available on PATH such as codex, claude, gemini, or opencode

Clean git state matters

The loop creates candidate commits and may revert them. Start from a clean working tree before you run research iterations.

Install the package

Create a pyproject.toml for the workspace where you want to run goalseek, and point uv at the latest wheel published in the repository dist/ folder:

[project]
name = "goalseek-runner"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
  "goalseek @ https://github.com/shambhu112/goalseek/raw/main/dist/goalseek-0.1.1-py3-none-any.whl",
]

Then create the environment and sync it:

uv venv .venv
uv sync

Verify the CLI:

uv run goalseek --help

Create a project

Choose a provider and model, then scaffold a new project.

Codex
Claude Code
Gemini
OpenCode

uv run goalseek project init demo --provider codex --model gpt-5.4-mini

uv run goalseek project init demo --provider claude_code --model claude-haiku-4-5-20251001

uv run goalseek project init demo --provider gemini --model gemini-2.5-pro

uv run goalseek project init demo --provider opencode --model gpt-5-codex

The scaffold includes:

demo/
  .git/
  manifest.yaml
  program.md
  setup.py
  validate_results.py
  experiment.py
  config/project.yaml
  context/
  data/
  hidden/
  logs/
  runs/

Check the manifest

Open demo/manifest.yaml and confirm the core files have the right modes:

files:
  - path: manifest.yaml
    mode: read_only
  - path: program.md
    mode: read_only
  - path: setup.py
    mode: read_only
  - path: validate_results.py
    mode: hidden
  - path: experiment.py
    mode: writable
  - path: hidden/**
    mode: hidden
  - path: config/**
    mode: read_only
  - path: runs/**
    mode: generated
  - path: logs/**
    mode: generated

Validate the manifest:

uv run goalseek manifest validate ./demo

Then inspect demo/config/project.yaml and confirm the hypothesis provider, implementation provider, model names, timeouts, and logging settings match the run you want.

Prepare baseline files

Before baseline, make sure these three files represent a runnable first version of the project:

File	Purpose
`experiment.py`	First implementation the verifier can train or run. It is writable and may be changed by later iterations.
`program.md`	Reusable read-only instructions for the planning and implementation providers. It is visible during `READ_CONTEXT`, `PLAN`, and `APPLY_CHANGE`.
`validate_results.py`	Hidden verification harness. It is not provider context. It runs during `VERIFY` when referenced by manifest verification commands.

The default manifest usually runs:

verification:
  commands:
    - name: train
      run: python3 experiment.py
      cwd: .
      timeout_sec: 1200
    - name: evaluate
      run: python3 validate_results.py --evaluate --output runs/latest/results.json
      cwd: .
      timeout_sec: 800

Make sure the metric extractor points at the verifier output:

metric:
  name: score
  direction: maximize
  extractor:
    type: json_file
    path: runs/latest/results.json
    json_pointer: /metric

Optional: import sample research assets

This repo includes a small Kaggle-style demo package.

./move-testpackage.sh --overwrite ./demo

After importing, re-check manifest.yaml, program.md, experiment.py, and validate_results.py.

Run the lifecycle

Prepare the project:

uv run goalseek setup ./demo

Commit local scaffold and setup changes before agent-driven edits:

uv run goalseek gittreeclean --message "clean repo" ./demo

Capture the baseline metric. Baseline runs verification on the current project without asking a provider to edit code.

uv run goalseek baseline ./demo

Run a few full iterations:

uv run goalseek run ./demo --iterations 3

Inspect status and summary:

uv run goalseek status ./demo
uv run goalseek summary ./demo

What to expect on disk

runs/0000_baseline/ stores baseline artifacts.
runs/0001/, runs/0002/, and later directories store iteration-specific prompts, plans, logs, and result records.
logs/state.json stores resumable loop state.
logs/results.jsonl stores append-only result summaries.

Good first inspection points

Open runs/0001/prompt.md, runs/0001/provider_output.md, and runs/0001/result.json after your first iteration. Those files make the system much easier to reason about.

What happens during a run

Baseline:

discovers the project root
validates manifest.yaml
loads effective config
runs verification commands
extracts the metric
writes runs/0000_baseline/ and initializes logs/state.json

Each later iteration:

READ_CONTEXT -> PLAN -> APPLY_CHANGE -> COMMIT -> VERIFY -> DECIDE -> LOG

The loop resumes from logs/state.json if a previous run stopped between phases.

Common failure modes

Missing provider CLI executable: install the matching provider tool and make sure it is available on PATH.
Dirty working tree: commit or restore local changes before run.
Manifest issues: re-run goalseek manifest validate and check path scopes plus metric extraction rules.
Verification failures: inspect runs/<iteration>/verifier.log.

Prerequisites​

Install the package​

Create a project​

Check the manifest​

Prepare baseline files​

Optional: import sample research assets​

Run the lifecycle​

What to expect on disk​

What happens during a run​

Common failure modes​