Skip to main content

Defining Rubrics

Rubrics are the criteria your evaluation scores against.

Anatomy of a Rubric

Every rubric has:

FieldDescription
NameShort label (e.g. "Accuracy", "Tone")
CriteriaNatural-language description of what you're evaluating
OptionsTwo or more possible labels (e.g. "Accurate" / "Inaccurate") with descriptions
Best optionThe label that represents a passing score. (E.g. between "Good"/"Bad", if best option is "Good", then a score of 90% means 90% of items are "Good")
Scoring modeWhether to evaluate the full response or individual claims

Scoring Modes

ModeHow it worksBest for
Response-levelJudge evaluates the entire answer and picks one optionTone, helpfulness, style
Claim-basedAnswer is split into atomic claims; each claim is scored independentlyFactual accuracy, hallucination

Claim-based scoring gives finer-grained results but costs more tokens (one judge call per claim).

note

Currently, scoring mode is not configurable in the UI. The Accuracy preset uses claim-based scoring; all other rubrics use response-level scoring by default.

Rubric Types

Kaleidoscope organises rubrics into two groups:

Preset Rubrics

Built-in templates you can optionally add to a target. These presets come with specifically designed judge prompts which we tuned through several rounds of aligning the judge's scores against human labels across different use cases.

RubricCriteriaScoring Mode
AccuracyAre the claims in the response supported by the provided context, or do they contain hallucinations?Claim-based
EmpathyDoes the response demonstrate empathy and emotional awareness appropriate to the user's situation?Response-level
VerbosityIs the response appropriately concise, or does it include unnecessary repetition, filler, or excessive detail?Response-level

Custom Rubrics

Rubrics you define in natural language. The criteria you define is fed into the LLM judges.

Rubrics

When you add or update a rubric, Kaleidoscope automatically creates baseline judges for it using your configured providers. See Scoring and Judges for details.

Customizing the Judge Prompt

Each rubric has an editable judge prompt template that baseline judges use when scoring responses. To edit it, open the rubric on the rubrics page and click Customize prompt.

The editor lets you:

  • Rewrite the prompt to emphasize different aspects of the criteria
  • Save changes or cancel to discard them
warning

Saving a new prompt on a rubric that already has scoring data (annotations, judge outputs, or overrides) will reset all that data. Kaleidoscope shows a confirmation warning before proceeding.

Baseline judges stay in sync with the rubric's prompt template — changing it here affects all baseline judges for that rubric. To experiment with a completely independent prompt on a specific model, create a Custom Judge instead.