MedOpenClaw
Full-study medical imaging agents with auditable evidence

MedOpenClaw enables agents to operate medical imaging viewers over complete studies.

Every viewer action, evidence artifact, and final answer can be replayed and audited across complete radiology studies and whole-slide pathology images.

5 task families
3 evaluation tracks
2 medical viewers
Trace replayable by design
From selected images to complete studies

Medical imaging evaluation should include study search, viewer operation, and checkable evidence.

Many benchmarks start from pre-selected slices, crops, or patches. Real imaging workflows require searching full studies, navigating software state, comparing views or timepoints, and documenting evidence that can be audited.

Conventional image QA

  • Pre-selected diagnostically relevant 2D views.
  • Black-box final answers with limited process visibility.
  • Little pressure to preserve viewer state, coordinates, or evidence provenance.

MedOpenClaw runtime

  • Complete studies in 3D Slicer and QuPath.
  • Bounded, replayable action traces over viewer state and generated artifacts.
  • Structured evidence that can be scored against hidden masks, annotations, and labels.
Runtime architecture

A software-native runtime for full-study inspection and evidence capture.

MedOpenClaw sits between a backbone VLM and standard medical imaging viewers. Agents use predefined viewer, evidence, and analysis actions, while each step is recorded as a replayable trace.

Viewer Control

Drive 3D Slicer and QuPath through bounded actions: series selection, scrolling, windowing, panning, zooming, fusion, and viewport capture.

Evidence Capture

Export key slices, RAS coordinates, whole-slide coordinates, masks, measurements, bookmarks, and state snapshots for deterministic scoring.

Advanced Operations

Expose vetted segmentation, registration, resampling, quantitative analysis, and MONAI/VISTA3D workflows as auditable software operations.

Auditable trace experience

Replayable traces connect viewer actions, software state, evidence, and final answers.

A MedOpenClaw episode records what the agent saw, which operations it called, which artifacts were produced, and which evidence was available when the final answer was submitted.

1

Open full study

Load complete radiology volume or whole-slide image.

2

Inspect and operate

Navigate slices, magnifications, fusion state, or analysis tools.

3

Capture evidence

Record slices, coordinates, masks, ROIs, and viewer snapshots.

4

Score deterministically

Check answers and evidence against hidden references.

Auditable MedOpenClaw execution traces
MedCopilot direction

The same runtime contract can support clinician-facing copilot interfaces.

MedCopilot is an example application built on top of MedOpenClaw, where generated actions, viewer states, and evidence artifacts remain visible for clinician inspection. We present it as a demonstration of traceable interaction, not as evidence of clinical deployment or workflow-efficiency gains.

MedCopilot style viewer-native brain MRI trace
Citation

Paper and technical artifact

The website treats the paper as technical backing for the platform. The landing experience foregrounds the runtime, benchmark, demos, and leaderboard.

@misc{shen2026medopenclaw,
  title={MedOpenClaw and MedFlow-Bench: Auditing Medical Agents in Full-Study Workflows},
  author={Shen, Weixiang and collaborators},
  year={2026},
  url={https://jakobshen.github.io/MedOpenClaw/}
}