his paper considers the Pointer Worth Retrieval (PVR) benchmark launched in [ZRKB21], the place a `reasoning’ perform acts on a string of digits to provide the label. Extra typically, the paper considers the training of logical features with gradient descent (GD) on neural networks. It’s first proven that in an effort to study logical features with gradient descent on symmetric neural networks, the generalization error could be lower-bounded when it comes to the noise-stability of the goal perform, supporting a conjecture made in [ZRKB21]. It’s then proven that within the distribution shift setting, when the info withholding corresponds to freezing a single characteristic (known as canonical holdout), the generalization error of gradient descent admits a good characterization when it comes to the Boolean affect for a number of related architectures. That is proven on linear fashions and supported experimentally on different fashions equivalent to MLPs and Transformers. Specifically, this places ahead the speculation that for such architectures and for studying logical features equivalent to PVR features, GD tends to have an implicit bias in direction of low-degree representations, which in flip provides the Boolean affect for the generalization error below quadratic loss.