This article has a precise aim: to showcase the capabilities of the GPT-5 model when employed to elaborate scientific studies at the doctoral level. The text is not intended as an original contribution to academic research but rather as a demonstration of the model’s ability to handle complex subjects, weaving together mathematics, theoretical physics, and neural network theory. The goal is to test how far GPT-5 can go in producing a rigorous analysis, complete with advanced formalisms and conceptual references, while maintaining clarity of exposition. For this reason, the article first offers a popularized explanation and, in a second stage, provides the complete study in its full version for download.
In three different scientific domains, one encounters the same scenario: information is present, but full access to what occurs in its internal layers remains obstructed. Deep neural networks of the Transformer type, based on the attention mechanism, generate linguistic predictions with high accuracy. This mechanism allows each element of the sequence to evaluate the relative weight of the others and to determine which parts of the context are most relevant for the next prediction. In this way, the network manages long-range dependencies and processes sequences in parallel, overcoming the limits of recurrent or convolutional models. Despite predictive effectiveness, it remains difficult to obtain a direct explanation of the network’s decisions, which is why it is often described as a true black box. In black hole physics, the horizon hides internal dynamics while still allowing the detection of external signals. In quantum mechanics, measurement acts on a state that may contain several possibilities simultaneously and transforms it into a concrete outcome according to probability distributions. This process interrupts superposition and selects a result, but it does not alter the general law that governs the system, which remains unitary and describes a deterministic evolution of the overall state when no measurement occurs. The study sets these three cases in parallel using the language of information, not to confuse them but to derive tools capable of producing computable quantities, precise limits, and controllable predictions. The issue, therefore, is not the superficial similarity between such different fields, but the usefulness of treating them through common concepts that clarify how much information is accessible, how much remains hidden, and how these quantities can be measured.
The analysis on the machine learning side begins with a concrete example. A Transformer receives a text sequence as input, computes attention weights that regulate interactions among positions, and produces logits, from which the probabilities of the next token are derived. Training minimizes a loss function, and the parameters evolve according to dynamics that include minibatch noise. Near a minimum, the curvature of the error function determines the amplitude of parameter fluctuations. All of this is described precisely on the mathematical level but does not clarify how much of the learned information remains available to an external observer. Hence arises the need to introduce tools that make measurable the network’s ability to carry and preserve information through its layers.
To address this challenge, the study introduces the concept of an information bottleneck computed operationally. The computation graph is constructed, and an intermediate layer or the final output is selected. A cut is then defined that separates the boundary part, formed by inputs and the autoregressive cache, from the internal part of interest. Each connection crossed by this cut is assigned a weight derived from local quantities such as the Jacobian and the covariance of activations, with the addition of controlled noise that stabilizes the estimate. The sum of these weights constitutes the computational area of the cut, and the minimum among all possible cuts provides the minimal computational area separating boundary and target region. This is a concrete measure—non-intuitive but computable—that depends on local properties of the network and the statistical distribution of activations. The idea is that the area describes how much informational space must be traversed to carry signals from the boundary inward: a low value indicates that the network has made the passage more transparent, while a high value signals a more difficult barrier to cross.
This area is connected to a theoretical result akin to a cut theorem. The mutual information between the inputs and downstream internal variables cannot exceed the value of the minimal computational area. From this follows a constraint on probes—small decoders trained to read internal states. When the minimal area is small, no probe can surpass a certain accuracy level, regardless of its quality. This consequence is significant because it shifts the focus from the performance of a probe to the limit imposed by the structure of the network itself. If a probe fails in the presence of a small area, the limit is intrinsic; if it fails with a large area, the cause lies in probe design or in the nature of the property being sought. In this way, it becomes possible to distinguish limits that depend on the model from those arising from analysis tools.
To observe how this accessibility varies during training, the study employs two dynamic tools. The first evaluates the sensitivity of outputs to input perturbations through neural out-of-time-order correlators. It measures how the norm of the gradient of logits with respect to input grows with network depth. Initially, growth is rapid, while later it stabilizes as the Jacobian spectrum takes on characteristics typical of random matrices. This behavior allows for the description of a kind of controlled chaos regime in information propagation. The second tool is the entropy curve of logits during generation. One observes the variation of predictive dispersion at each step and across training epochs. Comparing this entropy with the minimal area profile makes it possible to distinguish phases in which the network compresses and organizes information from those in which it accumulates examples without extracting general rules. The combination of the two diagnostics provides a more complete dynamic picture of the model’s internal life, showing both how much information is transported and in what form it is maintained or transformed.
Another discussed quantity is the effective temperature, which relates optimization noise to error geometry. It depends on the amplitude of minibatch noise, the learning rate, and the local curvature of the loss function. Measured alongside area and sensitivity, it allows for the identification of thresholds beyond which different learning regimes emerge. In regular tasks, the study shows that such thresholds coincide with the appearance of simpler, generalizing structures in place of fragmented memories. The notion of effective temperature thus adds a quantitative criterion for interpreting sudden changes often observed in training processes, such as grokking phases or transitions between memorization and generalization.
Conceptual tools from physics reinforce this framework. At a black hole horizon, area is tied to entropy that scales with surface; in holographic theories, internal quantities are reconstructed from boundary minimal surfaces; in quantum mechanics, the entropy of a subsystem describes the information lost when tracing over the environment. The study does not confuse these situations but uses their structural correspondences to transfer methods across domains. The minimal surface of the computational graph thus becomes a rigorous object defined by derivatives and covariances. If predictions tied to this construction hold under empirical testing, the analogy gains value; if not, it must be reduced. The use of this parallelism is therefore bound to its ability to produce measurable and falsifiable results, not rhetorical suggestions.
The theoretical framework is accompanied by an experimental protocol designed to be reproducible. One selects a network layer or output, identifies the cut separating it from inputs, estimates connection weights using Jacobian-based techniques and logarithmic traces, sums contributions, and calculates the area. In parallel, the best possible probe is trained, and its loss and error are compared with limits derived from the area. Repeating the procedure during training yields the evolution of informational bottlenecks. On tasks with known, verifiable rules—such as formal languages or modular arithmetic—clear and unambiguous data are collected. The same pipeline allows estimation of sensitivity and effective temperature, thereby combining the description of static structure with that of learning dynamics. The aim is not limited to post hoc explanation but also to equipping researchers with tools to predict and guide the behavior of complex models.
A concluding aspect concerns the explanatory value of this methodology. To explain, in this context, means to demonstrate that there exists a low-complexity reconstruction channel between what we observe and what we seek to understand about the inside of the network. The minimal computational area quantifies the cost of this channel. With a high area, the problem cannot be solved with increasingly complex probes but by changing the question, the class of probes, or the architecture. With a low area, if probes fail, the cause must lie in the reading technique. This criterion enables the distinction between intrinsic limits and instrumental limits, eliminating interpretive ambiguities. Thus, the search for explanations no longer rests on isolated intuitions but on repeatable, comparable measures.
The comparison with physics further clarifies the boundaries of analogy. Quantum collapse is a phenomenon tied to measurement and decoherence, whereas computational collapse during decoding is a classical probabilistic update. The internal information of a black hole remains inaccessible for causal reasons, while that of a neural network can be made accessible within limits set by area and probe class. Analogies hold value when they involve entropies, minimal surfaces, and channel capacities; they cease to be useful when attempting to transfer ontological properties from one domain to another. This distinction is essential to maintaining a solid and verifiable approach.
The study proposes a broad and articulated path for addressing the black-box problem with precise scientific tools. There exists a computable quantity that sets an upper bound on internal decodability. Diagnostics are available that link stability, sensitivity, and predictive dispersion. An experimental procedure has been devised that makes these ideas testable on concrete, widely used models. For those working in artificial intelligence, this means distinguishing cases where it makes sense to ask for local explanations from cases where one should instead intervene on structure or training regime. For those engaged in information physics, it means rediscovering consolidated tools applied in a new context. The interest does not lie in the analogy itself but in the ability to produce criteria that improve understanding, prediction, and control. With the extension of these analyses and their systematic application, a field of study opens in which different disciplines can contribute to a shared vision, where opacity becomes the object of concrete measurement and no longer only of unresolved questions.
