Reproducibility in the Age of AgentsGCC 2026
Designing for human reproducibility, accelerating it for everyone else.
AI is an ethical & environmental disaster that requires brilliant leadership across every level of society 😔 This slide is just John, not my co-authors. I am here because I need to eat
Why, when it is easier than ever to write
well-designed code , are we instead getting
inundated with slop 1 ?
The answer is incentives .
1 see @jmchilton’s PRs
Bioinformatics is following: the field is going to automate the very reproducibility crisis itself.
Galaxy MUST incentivize reproducibility for agents.
An Optimistic Thesis
Designing for human reproducibility supercharges agents.
Marius made the point Monday: UDTs were in development before agentic use of Galaxy.
Built for humans, supercharging agents.
3 WIP Papers Support this Implicit Thesis Designing for human reproducibility supercharges agents.
Example 1: Mobile Resistome Backward extraction from a four-isolate Staphylococcus aureus notebook recovered a 14-step , sample-agnostic workflow.
8 collection map-over steps9 workflow outputs0 dangling inputsbyte-identical re-run across all four isolatesBioProject PRJDB8599 · analysis through heatmaps
History → Notebook → Workflow History
computational record
datasets, tools, parameters, provenance
→
Notebook
communicative record
why it mattered, which outputs count, what to report
→
Workflow
reusable graph
backward closure from referenced artifacts
Example 2: Differential ATAC-seq Rendered notebook bits: PDF outputs and tabular outputs stay on-graph.
Count matrix + sample sheet → DESeq2 → NA-filter → volcano ∥ significance filter → sort → top gained / lost peaks.
9 extracted steps from a 590,650-peak universe6 workflow outputs, 0 dangling inputs, 0 report repairs45,620 significant peaks reproduced34,873 B-cell-gained / 10,747 erythroblast-gainedCorces 2016 ATAC-seq atlas · erythroblast vs B-cell · hg19
Extraction Is Easier Before
A human reverse-engineers a busy history: jobs, outputs, branches, and connections.
Now
The notebook already names the outputs that matter; Galaxy walks the graph backward.
After
The extracted workflow starts with the story attached: the report is seeded from the notebook.
Don’t just make the analysis reproducible, make the communication of the analysis reproducible.
Skill: reproduciblify Don’t write the notebook and then extract the workflow by hand. Let an agent rebuild the history so extraction works.
start
messy real history
Manual uploads, pasted figures, scratch steps, and one-sample-at-a-time structure.
agent work
rebuild inside Galaxy
Find existing tools first, create tools only as fallback, and restructure with collections.
finish
notebook extracts cleanly
Only real Galaxy outputs become notebook anchors, so extraction can recover the reusable workflow.
galaxyproject/galaxy-skills#29
Skill: workflow reports After a workflow runs, turn the invocation into a report with the same reproducibility guarantees.
input
workflow definition
Read a local .ga file or a Galaxy workflow download with labels and marked outputs.
agent work
draft report markdown
Use Galaxy directives for inputs, outputs, images, tables, workflow diagrams, and history links.
result
run-specific report
The same template resolves against each invocation, keeping analysis communication reproducible.
galaxyproject/galaxy-skills#14
nf-core/rnaseq
Format 2 + gxwf Every workflow system validates something. Galaxy can validate the
scientific tool invocation itself .
native .ga
machine serialization
"tool_state": "{
\"reference_source\": {
\"reference_source_selector\": \"cached\",
\"ref_file\": \"hg38\"
},
\"output_sort\": \"coordsorted\"
}"Format 2
human + agent authoring
state:
reference_source:
reference_source_selector: cached
ref_file: hg38
output_sort: coordsorted10,000+ ToolShed-served typed parameter schemas
Names, types, select options, conditionals, collections
gxwf validates offline in milliseconds
gxwf: one contract, two runtimes A shared workflow-state specification and fixture suite keep the Python and TypeScript implementations converging.
TypeScript / npm
published now
@galaxy-tool-util/* packages feed the CLI, web UI, reports, and VS Code extension.
Python / Galaxy
coming soon
Pydantic report models and Galaxy-side workflow-state tooling are staged for integration.
Shared truth
spec + tests
OpenAPI contracts, report-model JSON shapes, and declarative YAML fixtures keep the implementations honest.
Same reports, same validation concepts, different hosts: CLI, web operations, Galaxy, and VS Code.
Published CLI + docs npm install -g @galaxy-tool-util/cliworkflow validation, cleanup, linting, conversion, roundtrip
Tool state, validated An illegal select value, caught before a single job runs — with the legal options in the message:
state:
format: BAMX # typo
$ gxwf validate wf.gxwf.yml
[0] call_peaks FAIL
format:
expected "BAM" | "BAMPE" | "BED",
actual "BAMX"Names, types, select options, conditionals — same schema the Galaxy UI uses. Diagnostics are structured (path + category), so an agent fixes in a tight loop instead of waiting on a failed job.
Connections, validated Connections aren’t just producer→consumer links — they carry Galaxy’s collection algebra .
$ gxwf validate pe-artic-variation.ga \
--connections
Tool state: 25 validated, 0 skipped
Connections: OK — 46 ok, 0 invalid, 0 skipA list wired into a single-dataset input implies map-over; an incompatible depth (e.g. list:paired → paired with no flatten) is rejected statically.
galaxy-workflows-vscode An IDE for Galaxy workflows
Full .ga + gxformat2 coverage
Native Galaxy workflows and Format 2 workflows both get a real editor experience.
Schema-aware
Validation, hover docs, IntelliSense, formatting, outline, diagrams.
Thank you, David
Huge, repeated thanks to David López for building the extension that makes these IDE demos real.
github.com/davelopez/galaxy-workflows-vscode
IDE work: find the tool ToolShed search inside VS Code resolves a human query to a versioned Galaxy tool.
IDE work: complete state keys The extension completes Format 2 parameter names from the selected tool schema, including nested state.
IDE work: complete legal values Select parameters expose their enum values in-place, before validation or execution.
Format 2 is good for agents because it is good for humans More intuitive
YAML names, nested state, inputs, outputs, and steps read like the workflow humans already discuss.
Less context
The schema carries tool IDs, legal values, connection shape, and validation categories so prompts do not need to.
Robust tooling
CLI, IDE, browser, and Galaxy validation all report against the same typed workflow surface.
designed for humans, supercharging agents
Even when we ask agents to build workflows,
designing for humans supercharges agents.
Skills are the wrong source of truth A hand-authored conversion skill works until the ecosystem moves.
Context flooding
Every run drags in conditionals, collections, tests, wrappers, and caveats.
Brittle composition
Paper, Nextflow, CWL, and interview paths duplicate the same workflow moves.
Prose caveats
”Remember to validate” is weaker than a schema and a command that must pass.
Compressed evidence
Corpus examples and design rationale get summarized until they stop being auditable.
Runtime captivity
A skill written for one agent surface does not become portable by hoping.
No human-scrutable source
When the skill is wrong, there is no richer upstream artifact to inspect and fix.
The skill can run; the maintainer still needs to audit why it says what it says.
Pipelines are journeys A pipeline is not a giant prompt. It is an ordered Mold sequence with visible handoffs.
source-specific summary target-specific design briefs corpus comparison draft implementation loop tests, validation, execution, debug Pipelines make the journey browseable for humans and executable for harnesses.
Interview → Galaxy A conversation becomes a typed workflow draft, then a validated workflow.
Normalize an interview into a shared freeform summary.
Design Galaxy interface and data flow, then compare to IWC exemplars.
Loop over advance-galaxy-draft-step until no drafty step remains.
Validate, test, execute, and debug with deterministic tooling in the loop.
The same analysis, two directions Example 2 extracted differential ATAC-seq from a completed run. A Foundry pipeline constructed the same analysis from the same initial prompt — no history, no execution.
%%{init: {'theme':'base','themeVariables':{'fontFamily':'Atkinson Hyperlegible','primaryColor':'#25537b','primaryTextColor':'#ffffff','primaryBorderColor':'#2c3143','lineColor':'#58585a','fontSize':'15px'}}}%%
graph LR
input_0>"ATAC counts"]
input_1>"sample metadata"]
step_0["DESeq2 differential test"]
step_1["Clean table (NA filter)"]
step_2["Volcano plot"]
step_3["Filter significant peaks"]
step_4["Sort by log2FC"]
step_5["Top gained peaks"]
step_6["Top lost peaks"]
input_0 --> step_0
input_1 --> step_0
step_0 --> step_1
step_1 --> step_2
step_1 --> step_3
step_3 --> step_4
step_4 --> step_5
step_4 --> step_6
classDef input fill:#edf4fa,stroke:#25537b,color:#2c3143;
classDef core fill:#25537b,stroke:#2c3143,color:#ffffff;
class input_0,input_1 input;
class step_0 core; count matrix + sample sheet → DESeq2 → NA-clean → volcano ∥ filter → sort → top gained / lost every step a real, version-pinned Galaxy tool tool_state schema-validated offline by gxwfDiagram emitted by gxwf mermaid from the pipeline’s output workflow.
Extraction needs a completed run; construction needs only the intent. Both land on the same kind of validated, reproducible workflow.
Patterns are the reusable moves 1. Patterns MOC
Start from corpus-grounded maps, not a flat pile of recipes.
2. Collections MOC
A map-of-content routes the agent to the right collection operation.
3. Concrete recipe
Leaf patterns preserve when-to-use guidance, pitfalls, and exemplar links.
Patterns stay human-readable; casts can package the same evidence as runtime references.
Structured Drafting of Workflows
Because reproducibility is more important than ever and designing reproducibility infrastructure for humans supercharges agents,
Galaxy must meet this moment by doubling down on our values.
Thanks
Galaxy community
Nekrutenko lab
IWC
ToolShed contributors
Questions?
Galaxy NotebooksGalaxy Notebooks : Galaxy-flavored markdown attached to histories.
Embed datasets, collections, and interactive visualizations in prose (including drag and drop).
Referenced outputs seed workflow extraction (with reports).
AI assistant can read history context and draft sections (including MCP support).
Every revision attributed: user / agent / restore.
Builds on what used to be called “Pages”, New in 26.1
Make the knowledge base executable Build actionable skills from inspectable, typed, synchronized source material.
Schemas
typed workflow artifacts
Drafts, summaries, tool state, tests, provenance.
Upstream specs
strictly synchronized
gxformat2, Galaxy collection semantics, CWL, tool XML.
CLI manuals
commands as contracts
gxwf, Planemo, validator outputs, sidecar metadata.
Research + patterns
corpus-grounded moves
IWC examples, design rationale, when-to-use guidance.
Knowledge stays inspectable for humans.
→
Casts become executable for agents.
That is Foundry.
Workflow draft format Concrete now
inputs, outputs, step set, producer → consumer edges, branches, when: guards
Deferred explicitly
TODO_* sentinels for tool IDs, versions, and wrapper-defined ports
Intent carried forward
_plan_* fields tell the implementation Mold what the source evidence supports.
class: GalaxyWorkflowDraft
inputs:
reads:
type: collection
collection_type: list:paired
steps:
align_reads:
tool_id: TODO_mapper
tool_version: TODO
_plan_context: "map paired reads to the reference"
_plan_in:
reads: "paired input collection"
reference: "selected genome"
in:
reads: reads
reference: TODO_reference_port
outputs:
aligned_bam:
outputSource: align_reads/TODO_bam_outputA fully resolved draft promotes to ordinary gxformat2 with no translation layer.
Draft tooling: validate in the loop pick
gxwf draft-next-step
Deterministically identifies the next unresolved step.
resolve
discover or author
Find a Tool Shed wrapper first; author a wrapper only on fallthrough.
implement
advance draft step
Fill tool_state, ports, IDs, versions, and remove planning fields.
validate
draft-validate —concrete
Schema errors route back to the responsible authoring phase.
while gxwf draft-next-step workflow.gxwf.yml says draft:
invoke advance-galaxy-draft-step
gxwf draft-validate --concrete workflow.gxwf.yml
author → validate → fix, one workflow step at a time
Agents need a tight workflow authoring loop Format 2 turns workflow construction into small, checkable edits instead of one giant serialized Galaxy JSON guess.
The agent can draft a step, validate tool state, validate connections, repair the exact path, and continue.
gxwf validate workflow.gxwf.yml
gxwf validate workflow.gxwf.yml --connections
gxwf draft-validate --concrete workflow.gxwf.yml
# error path -> targeted repair -> repeatdesigned for humans, supercharging agents