Principled attribution in multi-step reasoning for thinking models

This repository contains some of the code for our work on principled attribution in multi-step reasoning for thinking models.

Our research focuses on quantifying the causal importance of each sentence in a chain-of-thought, both in terms of its influence on the final answer and on subsequent reasoning steps. We first estimate this through resampling-based interventions, and then explore whether it can be approximated using internal model signals.

You can find a detailed presentation on this topic here. You can also find an interactive visualization here.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
figures		figures
scripts		scripts
README.md		README.md
analyze_rollouts.py		analyze_rollouts.py
analyze_step_transitions.py		analyze_step_transitions.py
attribution_benchmark.py		attribution_benchmark.py
generate_chunk_rollouts.py		generate_chunk_rollouts.py
generate_cots.py		generate_cots.py
get_stats.ipynb		get_stats.ipynb
kl_attribution.py		kl_attribution.py
prompts.py		prompts.py
selected_problems.json		selected_problems.json
step_attribution.py		step_attribution.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Principled attribution in multi-step reasoning for thinking models

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

interp-reasoning/principled-attribution

Folders and files

Latest commit

History

Repository files navigation

Principled attribution in multi-step reasoning for thinking models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages