Load your evaluation dataset using the DataLoader for multiple file formats. EvalData uses Polars DataFrames internally for fast, efficient data processing.
frommeta_evaluator.dataimportDataLoaderdata=DataLoader.load_csv(id_column="example_id",# Optional - auto-generated if not providedname="my_evaluation",file_path="my_data.csv",)
# If your EvalTask has:task=EvalTask(prompt_columns=["user_input"],response_columns=["llm_response"],# ...)# Your CSV must contain these columns:# user_input,llm_response# "What is 2+2?","The answer is 4"# "Hello there","Hi! How can I help you?"
# Option 1: Use existing ID columndata=DataLoader.load_csv(id_column="example_id",# Must exist in your CSVname="eval",file_path="data.csv",)# Option 2: Auto-generate IDs (recommended for simple datasets)data=DataLoader.load_csv(id_column=None,# Creates "id" column: ["id-1", "id-2", "id-3", ...]name="eval",file_path="data.csv",)
Data Name (name)
The name parameter provides a human-readable identifier for your dataset:
data=DataLoader.load_csv(name="customer_feedback_evaluation",# Descriptive namefile_path="data.csv")# Name is used in:# - Project serialization and loading# - Logging and error messages # - Result tracking and organization
File Path (file_path)
The file_path specifies the location of your data file:
# Relative paths (relative to current working directory)data=DataLoader.load_csv(name="eval",file_path="data/samples.csv"# Relative path)# Absolute pathsdata=DataLoader.load_csv(name="eval",file_path="/full/path/to/data.csv"# Absolute path)
Stratified Sampling
Create a representative sample that preserves data distribution.
# Sample 20% while preserving topic distributionsample_data=data.stratified_sample_by_columns(columns=["topic","difficulty"],sample_percentage=0.2,seed=42)# Add to MetaEvaluatorevaluator=MetaEvaluator(project_dir="quickstart_project",load=False)evaluator.add_data(sample_data)