Evaluation and Quality

Evaluation should not produce an abstract score. It should decide whether the output meets release-ready acceptance criteria.

Evaluation layers

Layer	Question	Evidence
Single case	Did one input produce the right output?	Fixture, snapshot, assertion
Scenario	Did a workflow complete?	Harness scenario, trace
Regression	Did old behavior break?	Test suite, link audit
Release	Is this ready for public users?	Build, accessibility, content review

A good evaluator:

This project should run at least:

bash

npm run docs:build
npm run docs:check-links

Manual checks:

Write 5 acceptance criteria for a new MCP tutorial page and turn 2 of them into automated checks.