Start here

New to AI safety? You’re exactly who this is for. You don’t need a PhD, a GPU, or prior interpretability experience — just curiosity and willingness to learn in the open.

1. Get the mental model

Read the five-minute intro: What is temporal awareness? The one-liner: we’re learning to read a model’s internal sense of time — short-term vs long-term — and to catch when it drifts, because that drift is a safety signal.

2. Set up (about 5 minutes)

git clone https://github.com/justinshenk/temporal-awareness
cd temporal-awareness
uv pip install -e ".[dev]"   # or: pip install -e ".[dev]"
make verify                  # checks the core claims, no GPU needed

Many contributions — docs, this website, data validation, analysis — need no GPU and no API keys at all.

3. Pick a good first issue

Browse good first issues (or the full issue tracker). Issues are labelled so you can find your level:

good first issue — scoped and mentored. Start here.
difficulty:intermediate — some ML/interp background helps.
difficulty:research — open-ended; you’ll help define the approach.
needs-dataset — building the dataset is the contribution.

To claim one, comment on the issue that you’d like to take it. Say where you’re at — we’ll help you scope it.

4. Open your first pull request

git checkout -b your-name/short-description
# make your change
ruff check . && pytest        # lint + tests
git push -u origin your-name/short-description

Then open a PR. The template asks what changed, how you tested it, and to link the issue (Closes #123). Small PRs are great, and questions in the PR are encouraged.

A quick glossary

Probe — a small linear classifier trained on internal activations to read off a property (here: short-term vs long-term).
Steering — nudging behavior by adding a direction to a model’s activations.
Activation — the internal numeric state a model produces as it processes text.
Horizon — how far into the future a decision reaches. Internal = what the activations imply; stated = what the model says.
Activation patching — swapping activations between two inputs to find which components causally drive a behavior.

Full details in the contributor guide. By participating you agree to our Code of Conduct.