The problem
A stroke patient opens the app to do cognitive exercises. Behind that simple moment, four things need to coordinate simultaneously: which exercise to show next, how hard to make it, whether the patient is struggling right now, and whether they should stop entirely because they’re fatigued.
Get the sequencing wrong and you show a memory exercise to someone whose attention is already depleted. Miss a fatigue signal and you push a recovering patient past their cognitive limit. These aren’t feature bugs — they’re clinical failures.
What the system actually needed
Adaptive selection with clinical awareness. The app tracks nine exercise types across six cognitive domains — memory, attention, executive function, language, visuospatial, mathematical. The system needed to pick the next exercise based on which domains are neglected, which are weak, and what difficulty level matches the patient’s current trajectory.
Real-time intervention during gameplay. Not just between exercises — during them. If a patient hits three consecutive errors or goes inactive for 15 seconds, the system needs to intervene with a contextual hint. Not after the exercise. Now.
A hard stop that overrides everything. Fatigue in stroke recovery isn’t “the user is bored.” It’s a clinical signal. Response times slowing by 30%, error rates climbing, session duration exceeding the therapist’s prescribed limit — any of these should pause the session, regardless of what the coaching layer wants to do next.
How it became an orchestration layer
Three services run concurrently during every session, each observing the same patient state but owning different decisions:
AICoachService owns exercise selection. Before each exercise, it reads the patient’s score history, identifies neglected cognitive domains (not practiced in 7+ days), finds weak areas (average score below 60%), and determines difficulty level based on the performance trend — improving, stable, or declining. It outputs one decision: which exercise, at what difficulty.
AIGuidanceEngine owns in-exercise intervention. During gameplay, it monitors a stream of events — correct answers, errors, response times, inactivity. When it detects struggle (3+ consecutive errors, 15+ seconds of inactivity, or accuracy dropping 30%+ from baseline), it fires a contextual hint. It tracks whether its own hints are working — if guidance effectiveness drops, it backs off.
FatigueDetectionService owns the kill switch. It runs a multi-factor model: response time trending (30%+ slowdown = mild fatigue), error rate acceleration, and hard session duration caps from the patient’s accessibility settings. It escalates through three levels — mild (suggest a break), moderate (recommend stopping), severe (end the session). Fatigue overrides both the coach and the guidance engine.
The coordination rule is simple: FatigueDetectionService has veto power. AICoachService picks the next exercise. AIGuidanceEngine operates within the exercise. No service calls another directly — they all read from and write to shared observable state. The UI observes all three.
Three product decisions and the reasoning
1. Workflow, not agent. The AICoachService looks like an agent — it perceives, reasons, and acts. But the decision path is fully deterministic: check neglected domains, check weak domains, check difficulty trend, output exercise. There’s no loop, no replanning, no tool selection. Making it agentic would add latency (LLM inference on every exercise transition) and variance (the model might skip a critical domain). For a clinical product, predictability isn’t a constraint — it’s the feature.
2. Fatigue as a first-class orchestration primitive, not a UI warning. Early design had fatigue detection as a banner notification — “You seem tired, consider taking a break.” Users dismissed it. The architectural decision was to promote fatigue from a suggestion to a hard orchestration signal that could stop exercise selection entirely. This changed the service hierarchy: FatigueDetectionService went from a leaf node to the root authority. The therapist’s prescribed session limits became system-level configuration, not advisory text.
3. Guidance self-monitoring to prevent learned helplessness. The AIGuidanceEngine tracks whether its hints actually help — does accuracy improve after a hint fires? If the patient’s performance doesn’t recover after guidance, the engine reduces hint frequency rather than escalating. This prevents a failure mode where constant hints teach the patient to wait for help instead of attempting the exercise. The guidance engine has its own feedback loop, separate from the coaching layer’s performance tracking.