MetaEvaluator Base Class
The MetaEvaluator class is the central orchestrator for your evaluation workflow. It manages your evaluation data, task configuration, judges, and results within an organized project directory.
Quick Setup
Project Directory Structure
While using MetaEvaluator, you will see this directory structure:
my_project/
├── main_state.json # Project configuration and metadata
├── data/ # Evaluation dataset
│ └── main_data.json # Your evaluation data file
├── results/ # LLM judge evaluation results
│ ├── run_*_judge1_*.json
│ └── run_*_judge2_*.json
├── annotations/ # Human annotation results
│ ├── annotation_run_*_annotator1_*.json
│ └── annotation_run_*_annotator2_*.json
└── scores/ # Computed alignment metrics
├── score_report.html
├── classification_accuracy/
├── cohens_kappa/
└── text_similarity/
Directory:
main_state.json: Stores your project configuration (task schemas, data metadata, judge configurations)data/: Contains your evaluation dataset, referenced bymain_state.jsonresults/: Stores judge evaluation outputs (automatically created when running judges)annotations/: Stores human annotation data (automatically created when using the annotation interface)scores/: Contains computed metrics and comparison results
State Management: save_state()
Saved by save_state():
- Evaluation task configuration (schemas, columns, answering method)
- Data metadata and data files
- Judge configurations (model, provider, prompt files)
Results Persist Independently
Judge results, annotations, and scores are saved to their respective directories automatically and persist across sessions.
Loading Projects: load=True
load=True loads the saved state, so you don't have to constantly redefine your evaluation task, data, and judges.