Start here
New to AI safety? You’re exactly who this is for. You don’t need a PhD, a GPU, or prior interpretability experience — just curiosity and willingness to learn in the open.
1. Get the mental model
Read the five-minute intro: What is temporal awareness? The one-liner: we’re learning to read a model’s internal sense of time — short-term vs long-term — and to catch when it drifts, because that drift is a safety signal.
2. Set up (about 5 minutes)
git clone https://github.com/justinshenk/temporal-awareness
cd temporal-awareness
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
make verify # checks the core claims, no GPU needed
Many contributions — docs, this website, data validation, analysis — need no GPU and no API keys at all.
3. Pick a good first issue
Browse good first issues (or the full issue tracker). Issues are labelled so you can find your level:
good first issue— scoped and mentored. Start here.difficulty:intermediate— some ML/interp background helps.difficulty:research— open-ended; you’ll help define the approach.needs-dataset— building the dataset is the contribution.
To claim one, comment on the issue that you’d like to take it. Say where you’re at — we’ll help you scope it.
4. Open your first pull request
git checkout -b your-name/short-description
# make your change
ruff check . && pytest # lint + tests
git push -u origin your-name/short-description
Then open a PR. The template asks what changed, how you tested it, and to link the issue
(Closes #123). Small PRs are great, and questions in the PR are encouraged.
A quick glossary
- Probe — a small linear classifier trained on internal activations to read off a property (here: short-term vs long-term).
- Steering — nudging behavior by adding a direction to a model’s activations.
- Activation — the internal numeric state a model produces as it processes text.
- Horizon — how far into the future a decision reaches. Internal = what the activations imply; stated = what the model says.
- Activation patching — swapping activations between two inputs to find which components causally drive a behavior.
Full details in the contributor guide. By participating you agree to our Code of Conduct.