Setting up the Annotation Platform
Launch the Streamlit-based annotation interface to collect human evaluations for your dataset.
Quick Setup
Dataset Recommendations
- Use a small dataset (recommended: 30-50 rows of data)
- The interface supports automatic saving and session resumption
- Progress is auto-saved when required fields are completed and resumed upon session reopen
Launch Configuration
Port Settings
Remote Access with ngrok
When to use ngrok: If you want to share your annotation interface with remote annotators (e.g., teammates working from different locations) and your data is not classified or sensitive, you can use ngrok to create a public URL that anyone with the link can access.
How it works:
ngrok Setup Requirements
- ngrok creates a secure tunnel from a public URL (e.g.,
https://abc123.ngrok.io
) to your local machine. Remote annotators can access the interface through this URL without needing VPN or complex network setup. - Install ngrok:
brew install ngrok
(Mac) or download from ngrok.com - Authenticate:
ngrok authtoken YOUR_TOKEN
(free account required) - Traffic policy files allow advanced configuration including login authentication, IP restrictions, custom headers, and more. For examples and detailed configuration, see: ngrok Traffic Policy Documentation(https://ngrok.com/docs/traffic-policy/)
Data Security
Only use ngrok if your data is not classified or sensitive. ngrok creates a publicly accessible URL that exposes your local annotation interface to the internet. Anyone with the URL can access your interface. For classified or sensitive data, use local network access only or implement proper authentication.
Using the Interface
- Launch the interface: Load your data and task, and run
evaluator.launch_annotator()
- Open browser: Navigate to
http://localhost:8501
- Enter annotator ID: Provide your name or identifier
- Begin annotation: Start evaluating samples!
The interface automatically presents all tasks from your EvalTask
configuration in the UI:
- Uses annotator name and session time to differentiate annotations
- Auto-saves when required fields are filled; resumes from last incomplete sample on reload
The annotation platform (Web):

The annotation platform (Mobile):

Results and Output
File Structure
After annotation, results are saved to your project directory:
my_project/
└── annotations/
├── annotation_run_20250715_171040_f54e00c6_person_1_Person 1_data.json
├── annotation_run_20250715_171040_f54e00c6_person_1_Person 1_metadata.json
├── annotation_run_20250715_171156_617ba2da_person_2_Person 2_data.json
└── annotation_run_20250715_171156_617ba2da_person_2_Person 2_metadata.json
File naming pattern:
Annotation Data Format
Each annotation run creates two files: a data file containing the actual annotations and a metadata file with run information.
Data file example (*_data.json
):
[
{
"sample_example_id": "sample_1",
"original_id": "lg_auto_rt_successful_fn_en_60",
"run_id": "annotation_run_20250715_163632_09d6cc37",
"status": "success",
"error_message": null,
"error_details_json": null,
"annotator_id": "h1",
"annotation_timestamp": "2025-07-15 16:37:52.163517",
"hateful": "FALSE",
"insults": "FALSE",
"sexual": "FALSE",
"physical_violence": "FALSE",
"self_harm": "self_harm",
"all_other_misconduct": "all_other_misconduct"
}
]
Metadata file example (*_metadata.json
):
{
"run_id": "annotation_run_20250715_163632_09d6cc37",
"task_schemas": {
"hateful": ["FALSE", "hateful"],
"insults": ["FALSE", "insults"],
"sexual": ["FALSE", "sexual"],
"physical_violence": ["FALSE", "physical_violence"],
"self_harm": ["FALSE", "self_harm"],
"all_other_misconduct": ["FALSE", "all_other_misconduct"]
},
"timestamp_local": "2025-07-15T16:49:46.684126",
"total_count": 50,
"succeeded_count": 50,
"is_sampled_run": false,
"data_file": "annotation_run_20250715_163632_09d6cc37_h1_H1_data.json",
"data_format": "json",
"annotator_id": "h1",
"error_count": 0
}
Loading Annotation Results
After collection, load annotations for analysis: