How It Happened
What kicked it off
WWDC25 made Foundation Models real for third-party apps, so I wanted to try an offline version of the same document workflow on Apple's on-device model path.
What got hard
Apple's public on-device sessions are capped at 4096 tokens, and that same budget has to cover instructions, retrieved evidence, tool and schema overhead, and the answer itself. That is what pushed me into a recursive multi-session reasoning loop.
Why it matters
It is the same document problem again, just in an offline, on-device form.