Learning In-context n-grams with Transformers: Sub-n-grams Are Near-stationary Points Code for insert https://arxiv.org/abs/2508.12837 This code is adapted from the implementation by Nichani et al. (2024), available at https://github.com/eshnich/transformers-learn-causal-structure.