A Rate-Distortion Perspective on the Emergence of Number Sense in Unsupervised Generative Models
Abstract
Number sense is a core cognitive ability supporting various adaptive behaviors and is foundational for mathematical learning. Here, we study its emergence in unsupervised generative models through the lens of rate-distortion theory (RDT), a normative framework for understanding information processing under limited resources. We train $β$-Variational Autoencoders -- which embody key formal principles of RDT -- on synthetic images containing varying numbers of items, as commonly used in numerosity perception research. We systematically vary the encoding capacity and assess the models' sensitivity to numerosity and the robustness of the emergent numerical representations through a comprehensive set of analyses, including numerosity estimation and discrimination tasks, latent-space analysis, generative capabilities and generalization to novel stimuli. In line with RDT, we find that behavioral performance in numerosity perception and the ability to extract numerosity unconfounded by non-numerical visual features scale with encoding capacity according to a power law. At high capacity, the unsupervised model develops a robust neural code for numerical information, with performance closely approximating a supervised model explicitly trained for visual enumeration. It exhibits strong generative abilities and generalizes well to novel images, whereas at low capacity, the model shows marked deficits in numerosity perception and representation. Finally, comparison with human data shows that models trained at intermediate capacity levels span the full range of human behavioral performance while still developing a robust emergent numerical code. In sum, our results show that unsupervised generative models can develop a number sense and demonstrate that rate-distortion theory provides a powerful information-theoretic framework for understanding how capacity constraints shape numerosity perception.
Summary
This paper explores how number sense emerges in unsupervised generative models, specifically β-Variational Autoencoders (β-VAEs), using a rate-distortion theory (RDT) framework. The research question is whether unsupervised learning, constrained by information capacity, can lead to the development of robust numerical representations, mimicking human numerosity perception. The authors trained β-VAEs on synthetic images with varying numbers of items, systematically changing the encoding capacity (50 to 5000 nats) and assessing the models' performance on numerosity estimation and discrimination tasks. They also analyzed the latent space of the models and their ability to generate novel images. The key findings are that numerosity perception performance and the ability to extract numerosity independently of other visual features scale with encoding capacity, following a power law. High-capacity models (5000 nats) developed robust neural codes for numerical information, approaching the performance of supervised models trained explicitly for enumeration. These models also showed good generalization to novel stimuli and generative capabilities. Intermediate-capacity models (300-1000 nats) better matched human behavioral data, suggesting that capacity constraints can explain individual variability in numerosity perception. The research demonstrates that unsupervised generative models can develop number sense and that RDT is a useful framework for understanding how capacity limitations shape numerosity perception. This matters to the field because it provides a computational model grounded in information theory that can explain how a fundamental cognitive ability, number sense, can emerge without explicit supervision, and how individual differences can be explained by differences in encoding capacity.
Key Insights
- •Rate-Distortion Trade-off: The study demonstrates a clear rate-distortion trade-off in numerosity perception. As the encoding capacity of the β-VAE increases, the mean absolute error (MAE) in numerosity estimation decreases, and the Weber fraction (a measure of number acuity) in numerosity discrimination also decreases, both following a power law.
- •High-Capacity Model Performance: The high-capacity β-VAE (5000 nats) achieved a Weber fraction of 0.11 in numerosity discrimination, close to human performance, and an MAE of 1.71 in numerosity estimation.
- •Latent Space Organization: PCA analysis of the latent space of the high-capacity model revealed a structured representation of numerosity, with a gradual transition of colors along the first principal component, suggesting a "number line" representation.
- •Disentanglement: The high-capacity model showed a marked disentanglement of numerosity from non-numerical magnitudes like item size and field radius, enabling robust generalization to novel datasets.
- •Generative Ability: The high-capacity model could generate novel images with a specified number of items, demonstrating the ability to represent numerosity as an independently controllable visual property. Generation accuracy was higher for smaller numerosities.
- •Low-Capacity Model Limitations: The low-capacity β-VAE (50 nats) showed impaired performance in numerosity perception tasks, with a higher MAE (5.08) and Weber fraction (0.39), and limited generalization to stimuli with varying item sizes.
- •Human Data Alignment: Intermediate-capacity models (300-1000 nats) best matched human behavioral data on a numerosity comparison task, suggesting that capacity limitations can explain individual variability in human performance.
Practical Implications
- •Cognitive Modeling: The RDT framework and the β-VAE models can be used to develop more realistic cognitive models of human number sense, incorporating capacity limitations and explaining individual differences.
- •Developmental Dyscalculia: The findings suggest that capacity limitations may play a role in developmental dyscalculia, providing a potential target for interventions aimed at improving encoding efficiency.
- •AI and Computer Vision: The research provides insights into how to develop AI systems that can perceive and reason about numbers in a more human-like way, potentially improving performance in tasks such as object counting and visual reasoning.
- •Robotics: Robots operating in unstructured environments could benefit from algorithms that can robustly estimate numerosity from visual inputs, enabling them to make informed decisions about resource allocation and task prioritization.
- •Future Research: Future research could explore the emergence of number sense in other types of neural networks, investigate the role of different types of capacity constraints, and examine the interaction between unsupervised and supervised learning in the development of numerical abilities.