This repository contains some of the code for our work on principled attribution in multi-step reasoning for thinking models.
Our research focuses on quantifying the causal importance of each sentence in a chain-of-thought, both in terms of its influence on the final answer and on subsequent reasoning steps. We first estimate this through resampling-based interventions, and then explore whether it can be approximated using internal model signals.
You can find a detailed presentation on this topic here. You can also find an interactive visualization here.