Mechanisms of Skill Adaptation in Generative Models: Chess as a Model System

We study knowing but not doing: whether skill-adaptive models acquire knowledge of concepts at all skill levels but selectively externalize that knowledge depending on the conditioning signal. We use Maia-2 and chess as a model system, as skill levels are precisely quantified by Elo, concepts are formally definable, and move quality is objectively measurable.

Two competing hypotheses: (H1) the model dynamically adjusts internal concept awareness per skill level; (H2) concept awareness is consistent across skill levels, and the skill gap stems from differential externalization. We found strong evidence for H2.

Setup

conda env create -f environment.yml
conda activate maiainterp

Pretrained Maia-2 (Rapid) checkpoint: weights.v2.pt — download

SAE on residual streams: sae/best_jrsaes_2023-11-16384-1-res.pt — download

Pipeline

1. Knowledge Acquisition

To test H1, we ask whether internal concept awareness (across a set of 172 fundamental and measurable chess concepts) in Maia-2 varies with the conditioned Elo level. We train linear probes per concept, per Elo level, per layer:

python train/train_probes.py --layer_key "transformer block 0 hidden states" --output_dir probes/layer0_efficient

Result. The internal representations encode chess concepts equally well regardless of the conditioned skill level, refuting H1.

2. Knowledge Externalization

2a. Externalization via policy head fine-tuning

We fine-tune only the policy head fc_1 on the Blundered Transitional Dataset. This is the most stringent test of H2: policy-head-only fine-tuning introduces no new knowledge to the model backbone, only recalibrates when that knowledge is acted upon.

cd extern/policy_distillation
python ft_per_concept_head_only.py

Configuration: extern/policy_distillation/finetune_config.yaml

2b. Targeted externalization via SAE feature steering

We train a set of SAEs on MAIA2 and identify the SAE features most predictive of each concept to surgically amplify concept-relevant features in the residual stream at inference time - similar to the feature steering.

python extern/feature_steering/select_sae_features.py

python extern/feature_steering/sae_intervention.py --layer 0 --mode salient

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset/blundered-transitional-dataset		dataset/blundered-transitional-dataset
extern		extern
maia2		maia2
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanisms of Skill Adaptation in Generative Models: Chess as a Model System

Setup

Pipeline

1. Knowledge Acquisition

2. Knowledge Externalization

2a. Externalization via policy head fine-tuning

2b. Targeted externalization via SAE feature steering

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mechanisms of Skill Adaptation in Generative Models: Chess as a Model System

Setup

Pipeline

1. Knowledge Acquisition

2. Knowledge Externalization

2a. Externalization via policy head fine-tuning

2b. Targeted externalization via SAE feature steering

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages