Optimizing correct loss capabilities is popularly believed to yield predictors with good calibration properties; the instinct being that for such losses, the worldwide optimum is to foretell the ground-truth possibilities, which is certainly calibrated. Nonetheless, typical machine studying fashions are educated to roughly decrease loss over restricted households of predictors, which are unlikely to comprise the bottom fact. Below what circumstances does optimizing correct loss over a restricted household yield calibrated fashions? What exact calibration ensures does it give? On this work, we offer a rigorous reply to those questions. We substitute the worldwide optimality with an area optimality situation stipulating that the (correct) lack of the predictor can’t be lowered a lot by post-processing its predictions with a sure household of Lipschitz capabilities. We present that any predictor with this native optimality satisfies clean calibration as outlined in Kakade-Foster (2008), BÅ‚asiok et al. (2023). Native optimality is plausibly glad by well-trained DNNs, which suggests a proof for why they’re calibrated from correct loss minimization alone. Lastly, we present that the connection between native optimality and calibration error goes each methods: almost calibrated predictors are additionally almost regionally optimum.