Retry loops are the #1 cost driver
Across Loop Diagnostics this quarter, unbounded retry logic accounts for more runaway spend than model choice. The fix is almost always a hard attempt cap wired upstream of the agent, not a model swap.
Teams underestimate context window burn
Long-context calls get appended to on every retry. By attempt 4, the prompt is 3x the original. Cost is convex, not linear. Truncation or summarization before retry drops spend by 60-80% in most cases.
Hard caps alone are not enough
A kill switch with no fallback creates silent failures. The pattern that works: cap + graceful degradation path + alert to a human channel. Teams that skip the fallback see support tickets spike when the cap triggers.
Next issue drops July 2026
Working topics: quota-aware routing patterns, multi-model fallback failure modes, and findings from the first wave of Loop Sprints.