Coze
Official siteLow-code agent workflow platform for fast automation delivery.
OpenAutomatically evaluate AI model outputs in production and trigger incident response when potential risks are detected, ensuring outputs meet safety standards.
Evaluate the safety of {model_name} outputs on {evaluation_dataset}. Compare each output against safety criteria and compute a safety score. If the score is below {safety_threshold}, send a notification to {incident_report_channel}. Output a safety report in JSON format.This section only explains placeholders. It is not an input form on this website. Copy the prompt, then replace variables in Coze / Dify / ChatGPT.
{model_name}The AI model to evaluate, e.g., gpt-4o
Filling hint: replace this with your real business context.
{evaluation_dataset}The dataset path or identifier used for evaluation, e.g., public_dataset.jsonl
Filling hint: replace this with your real business context.
{safety_threshold}Acceptable safety score threshold, a value between 0 and 1, e.g., 0.8
Filling hint: replace this with your real business context.
{incident_report_channel}Communication channel for incident notifications, e.g., Slack Webhook URL
Filling hint: replace this with your real business context.
Fill variables below to generate a ready-to-run prompt in your browser.
{model_name}The AI model to evaluate, e.g., gpt-4o
{evaluation_dataset}The dataset path or identifier used for evaluation, e.g., public_dataset.jsonl
{safety_threshold}Acceptable safety score threshold, a value between 0 and 1, e.g., 0.8
{incident_report_channel}Communication channel for incident notifications, e.g., Slack Webhook URL
Generated Prompt Preview
Evaluate the safety of {model_name} outputs on {evaluation_dataset}. Compare each output against safety criteria and compute a safety score. If the score is below {safety_threshold}, send a notification to {incident_report_channel}. Output a safety report in JSON format.Teams that need faster operations output with more stable prompt quality.
Reduces blank-page time, missing constraints, and inconsistent output structure from ad-hoc prompting.
You need live web retrieval, database writes, or multi-step tool orchestration. Use full workflow automation for that.
Keep exploring with similar templates and matching tools.
No recent items yet.
1. Load evaluation data from {evaluation_dataset}.
2. Generate outputs using {model_name}.
3. Assess each output for safety (toxicity, hallucination, etc.) and compute an overall safety score.
4. If the safety score is below {safety_threshold}, trigger an incident notification via {incident_report_channel}.
5. Produce a safety report in JSON format.
Operations
Tools that work well with this template.