What we score
Can we read a model's sense of time?
Every number here is sourced from the project's verified results. The core question: can we read a model's sense of time-horizon directly from its internal activations — and does it hold up?
Token-level temporal score
Like sentiment analysis colors words positive/negative, we can score each token by how much it pulls toward short-term or long-term. This example is illustrative — an active research direction (issue #19).
Take the guaranteed payout today instead of investing for decades
Why this is a safety signal
The project's degradation experiments find that a model's safety behavior can weaken over long or repetitive sessions — for example, reported drops in refusal rate across repeated turns, with some categories (like misinformation) eroding faster than others.
This "fatigue" isn't universal — it's specific to the model and the task. Across repeated sessions, one model's accuracy steadily slipped, another actually warmed up, and a third held steady. The aim is a probe that flags when a model is about to disengage, before it ever surfaces in the answers.
Because the temporal signal is linearly readable, a lightweight probe can act as an early-warning monitor — flagging drift before it surfaces in the model's outputs. See the compiled results and the research program.