An Open Safety Project

Can we tell what an AI is thinking about time?

AIs quietly weigh “now” against “later” — and a capable one might act one way under test, another once the stakes are real. We're learning to read that inner sense of time directly, and to catch it drifting before it ever shows up in what the AI says.

Every day, AI systems make choices that play out over time — answer right now or think a little longer, help with this one reply or look after the whole conversation. Temporal awareness is simply how an AI leans in those moments: toward the quick payoff, or the patient one.

Here's the catch that makes it a safety question: an AI's words and its inner leanings don't always match. It can say something careful and short-term while, inside, it's set on a longer game — or the other way around. As AIs are trusted to run on their own for longer stretches, that hidden drift is exactly what we'd like to notice early.

There's a sharper version of the worry. A capable AI might behave one way when it senses it's being watched or tested, and another when it thinks the stakes are real — and "is this a test, or the real thing?" is, underneath, a question about where it is in time. If that sense leaves a trace inside the model, that's something we'd want to be able to read.

The encouraging news: this inner sense of time turns out to be surprisingly easy to read from inside the model. If that holds up, a small, cheap “detector” could watch for trouble in real time — a practical safety tool, not just a worry.

What follows is an honest scorecard: what we're fairly sure of, and the much longer list of what we still don't know.

What we've found

The story so far, in plain terms


92.5%
We can read it.

In a small AI, we can tell whether it's leaning short-term or long-term just by looking inside it — right about nine times out of ten.

99.2%
Even in bigger models.

In a larger, modern AI we can read that sense of time almost perfectly — even after removing every time-related word from the text.

~4×
We can nudge it.

Gently adjusting that inner signal makes the AI several times more likely to favour patient, long-term answers.

◦ early result
Early warning
Why it matters.

An AI's safety habits can quietly fade over long, repetitive sessions. Reading this signal could warn us before anything goes wrong in what it says.

Want the exact numbers and the methods behind them? See the full breakdown →

What we've done

Real progress, so far

01

We learned to read an AI's sense of time

Working with a small, well-studied AI, we showed that its short- versus long-term leaning can be read straight from inside it — and even gently adjusted.

02

We checked it on bigger, modern models

A team of eleven researchers found the same idea holds in today's larger AIs, like Qwen and Llama — not only the small one we started with.

03

We started watching safety “drift”

A larger study looked at how an AI's good behaviour can weaken over long, repetitive conversations — and which kinds of models hold up best.

04

We built open tools for everyone

A free toolkit lets anyone create tests, read the signal, and explore its “shape” in an interactive viewer.

05

We made it easy to join

Every task is a plain GitHub issue. No PhD, no expensive computer, and no special background needed to help.

What's missing

And what we still don't know

A safety project should be upfront about its limits. Here's what these results don't prove yet — and where you could help.

We need more examples

Some of our most exciting hints rest on only a handful of cases. We need many more before we can be sure.

Mostly shown on one small model

Our clearest results come from one small, older AI. We're still confirming they hold across today's bigger ones.

Are we measuring the real thing?

In one test a simple shortcut did just as well as our method — a reminder to keep checking we're reading real understanding, not a trick.

Nudging is still hit-or-miss

Steering an AI's sense of time works only about 60% of the time so far — promising, but not yet reliable.

A lot is built but unfinished

Many experiments are set up but haven't been fully run, and some safety tests didn't finish.

Newer tools, not yet applied

Some of the most promising new techniques are ready in principle but haven't been pointed at the time question yet.

Big-model questions remain

We haven't yet seen real long-range “planning” in the models small enough for us to study affordably.

Some data is still missing

A couple of key datasets aren't in the public project yet — a concrete, friendly place to start helping.

Join in

You can help — really

No PhD, no fancy hardware, and no background in AI safety required. Every task is a friendly GitHub issue. Pick one and dive in.

All tasks · Contributor guide · Research program · Results