Skip to content

interp-reasoning/principled-attribution

Repository files navigation

Principled attribution in multi-step reasoning for thinking models

This repository contains some of the code for our work on principled attribution in multi-step reasoning for thinking models.

Our research focuses on quantifying the causal importance of each sentence in a chain-of-thought, both in terms of its influence on the final answer and on subsequent reasoning steps. We first estimate this through resampling-based interventions, and then explore whether it can be approximated using internal model signals.

You can find a detailed presentation on this topic here. You can also find an interactive visualization here.

Main Image

About

Principled attribution in multi-step reasoning for thinking models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •