Takes agent predictions and true outcomes and returns confidence calibration metrics to assess predictive reliability.