Performance & Quality
Model Drift Monitor
Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.
How to use this dashboard
Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.
Use this monitor to compare repeated model answers over time for length, refusal rate, correctness, formatting quality, and tone.
Model Drift Monitor
3 records| 2026-04-29 | Manual test model | Manual log template | Reasoning baseline | reasoning-001 | 0 | Not checked | Not scored | Not scored | Not scored | Baseline template | Paste repeat-test results here after running the same prompt against the same model over time. |
| 2026-04-29 | Manual test model | Manual log template | Coding baseline | coding-001 | 0 | Not checked | Not scored | Not scored | Not scored | Baseline template | Track whether repeated coding answers become shorter, less direct, more restrictive, or less accurate. |
| 2026-04-29 | Manual test model | Manual log template | Writing baseline | writing-001 | 0 | Not checked | Not scored | Not scored | Not scored | Baseline template | Use a consistent 1–5 scoring rubric to compare clarity, directness, and usefulness over time. |