Beliefs in Transformation: Visualizing How a Machine Modifies Its Certainties Through Experience
Description: Interactive emulation of the 1958 Perceptron and the Multilayer Perceptron to visualize how machines learn, adjust their certainties, and manage ...
Interdisciplinary analysis and hands-on emulation of Rosenblatt's Perceptron (1958) and the Multilayer Perceptron. Visualization of machine learning, bias adjustment, and algorithmic uncertainty management, with an accessible approach for social sciences and law professionals.
Beliefs in Transformation: Visualizing How a Machine Modifies Its Certainties Through Experience
Theoretical Framework and Application
0. Before we begin: a minimum vocabulary
For the non-technical reader: this section defines four terms that will appear throughout the article. If you already know them, feel free to skip it. Otherwise, it will take you two minutes and will allow you to understand everything else.
1. Input (or «sensor»). It is a piece of data the machine receives from the world. In our emulator, each cell of the left grid is an input: it is worth 1 if you painted it black, 0 if you left it blank. Think of them as the witnesses testifying in a trial: each one contributes a fragment of information.
2. Weight (or «potentiometer»). It is the credibility the machine assigns to each input. If a witness usually tells the truth, their testimony carries a lot of weight; if they tend to lie, their testimony weighs little (or even counts against). In the emulator, weights are shown in the right grid: a large green number means «this cell is strong evidence in favor of Cat»; a large red number means «this cell is strong evidence in favor of Dog». At the start, all weights are zero: the machine trusts no one yet.
3. Bias (or bias). It is the machine's predisposition before receiving any evidence. In law, it would be a judge's initial inclination before hearing the testimonies. A positive bias means «I tend to believe it is a Cat»; a negative bias, «I tend to believe it is a Dog». The bias is adjusted with each error, just like a judge who, after hearing several cases, recognizes their own prejudices and corrects them.
4. Sum of evidence. The machine takes each input, multiplies it by its weight, and adds everything together along with the bias. The result is a number: if it is very positive, the machine says «Cat»; if it is very negative, it says «Dog»; if it is close to zero, the machine doubts. This operation —multiplying and adding— is all the «intelligence» of the 1958 perceptron. There is no magic, no mystery: it is elementary arithmetic.
5. Confidence threshold. It is the standard of proof. If the threshold is 2, the machine requires at least 2 net evidence points to make a decision. Between -2 and +2, it will say «I don't know». It is the algorithmic equivalent of reasonable doubt in criminal law: if the evidence does not meet the standard, there is no conviction. In the emulator, you can move this threshold with a slider and see how the requirement changes.
1. Conceptual Foundations
This article constitutes an outreach complement to the technical work published by Gustavo Salvini on his personal website, titled «Fundamentos IA: Qué es el perceptrón» (June 2026). While that work addresses the mathematical formulation, Novikoff's convergence theorem, and the limitations demonstrated by Minsky and Papert, the purpose of these lines is to offer a practical and accessible approach for readers without technical training —lawyers, social scientists, humanists— who wish to understand, perceive, and directly experience the operation of the first machine capable of «learning» from experience.
In 1958, psychologist and neurobiologist Frank Rosenblatt published his foundational article in Psychological Review: «The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain». That work was not just theory: Rosenblatt had built a physical machine, the Mark I Perceptron, capable of distinguishing simple visual patterns through the automatic adjustment of its internal connections.
The central idea is extraordinarily simple and bears a fascinating resemblance to the way humans form our first impressions: the machine starts from a state of total ignorance (all its internal «potentiometers» set to zero, like a judge without prejudices). When shown an example and told the correct answer, if it gets it wrong, it adjusts its connections in the direction that corrects the error, just as we reinforce or weaken a belief after a new experience. It does not require «training a model» in the contemporary sense —there are no gradients, no backpropagation, no large volumes of data—: just two or three examples are enough for the machine to form a memory that is visualizable, transparent, and debatable.
2. Methodological Development
Two languages were evaluated for the emulator implementation: Python and JavaScript. The choice fell on JavaScript embedded in HTML for practical reasons linked to the intended context of use: low-resource computers, users without technical knowledge, and the need for an immediate visual perception of learning.
JavaScript allows a single file of less than 5 KB to run in any modern browser without prior installation. Python, although excellent for mathematical computation, would require the installation of an interpreter and, to achieve visual interactivity, additional libraries such as tkinter or pygame, which introduces unnecessary complexity for the outreach objective.
2.1. Initial Approach
The emulator replicates the three original components of the Mark I described by Rosenblatt: the S Units (sensory, a 3×3 grid that serves as a «retina»), the A Units (associative, the nine potentiometers with adjustable weights plus a bias value), and the R Unit (response, the binary output). The implemented learning rule is exactly Rosenblatt's (1958): if the machine is right, it does nothing; if it is wrong, it adds or subtracts the input values from the weights as appropriate, and also adjusts the bias, that «initial prejudice» that tips the scales before seeing any evidence.
2.2. Interactive Implementation
Below is the emulator running directly on this page. The weights are visualized in real time: green indicates that the machine has learned that this pixel is evidence in favor of the «Cat» category; red indicates evidence in favor of «Dog» (or against «Cat»); gray indicates indifference. The bias appears as an additional cell labeled b.
🧠 Rosenblatt's Perceptron (1958)
Draw a shape on the left grid. Then teach the machine what it is.
S Units (Input)
Click to activate/deactivate
A Units (Weights + Bias)
The machine adjusts this automatically
What is this drawing?
With threshold 2, the machine needs at least 2 net evidence points to decide. Between -2 and 2, it will say «I don't know» (reasonable doubt).
3. Case Studies: the Simple Perceptron
How to use it (Practical guide). This section is designed for any user without technical training. No programming knowledge is required: just interact with the emulator embedded above, following the steps described below.
3.1. Step 1 — Teach the first concept (the «Cat»)
In the left grid (S Units), click on the cells to draw a simple shape, for example an «L» (activate the entire left column and the bottom-right cell). Then press the green button «🐱 Teach it is a CAT». The machine, which at the start knows nothing, will get it wrong (or get it right by chance) and adjust its potentiometers. Observe the right grid: you will see the cells turn green precisely in the shape of the «L». Those greens are the machine's memory: «when I see pixels here, I will think of a Cat». The bias will also have been adjusted, like a judge who begins to lean towards one of the parties.
3.2. Step 2 — Teach the second concept (the «Dog»)
Press «Clear Drawing». Now draw a different shape, for example an inverted «T» (top row complete plus the center cell). Press the blue button «🐶 Teach it is a DOG». The machine will again adjust its weights and you will see cells appear in red: it has learned that certain pixels are evidence against the Dog (or in favor of the Cat). Negative weights are as informative as positive ones: they represent «exculpatory evidence».
3.3. Step 3 — Test learning and generalization
Clear the drawing and trace a new shape, one that resembles the original «L» a bit but is not identical. Press «🔍 TEST». The machine will sum the values of the active pixels multiplied by the weights it adjusted itself, add the bias, compare the result with its internal threshold, and issue a verdict. If the shape resembles the «L» more, it will say «Cat»; if it resembles the inverted «T» more, it will say «Dog». The machine has generalized from just two examples, like a student who infers a rule from a couple of cases.
3.4. What the user perceives: radical transparency
Unlike contemporary neural networks —which require large volumes of data, computational power, and whose internal weights are incomprehensible even to their designers— here the user sees how the machine thinks. The right grid is a literal representation of the perceptron's «mind»: each green or red cell is an acquired belief, and the bias is its initial predisposition. This transparency is precisely what was lost in the transition from the classical perceptron to modern deep learning.
3.5. Doubt in the machine: from 1958 to the present
If you draw a shape completely different from those taught —for example, a diagonal— and press «🔍 TEST», it is very likely that the machine will respond «🤷 I DON'T KNOW». This happens when the sum of evidence for and against almost cancels out, indicating insufficient data to make a decision, analogous to «reasonable doubt» in a judicial process.
The historical reality (1958): In Rosenblatt's original Mark I, uncertainty did not exist as an operative concept. Its activation function was a rigid «step»: if the mathematical sum was ≥ 0, the machine «fired» (said Cat); if it was < 0, it did not (said Dog). Faced with an ambiguous or unknown drawing, the machine did not doubt: it issued a verdict based on the slightest residual bias of its weights (for example, a sum of 0.001 was interpreted as absolute certainty of being a Cat). This absence of «I don't know» is comparable to a witness who never admits to not remembering a detail.
Historical consequences: This inability to express doubt had a high cost. When the perceptron faced complex problems or non-linearly separable data, it offered erroneous answers with total confidence, like a recommendation algorithm that always suggests content without measuring its reliability. This «overdetermination» was one of the central arguments of Minsky and Papert (1969) to point out the model's limitations, contributing directly to the first «Artificial Intelligence winter», where much of the field's research funding was withdrawn.
The pedagogical modification: The incorporation of the «confidence threshold» and the slider in this emulator is a didactic extension, not a historical replica. Its purpose is to make the invisible visible: to demonstrate that the decision is not magical, but the result of a sum of evidence. We allow the machine to express insufficient data so that the user understands the difference between a well-founded decision and a conjecture. In fact, by moving the slider you can see exactly how the standard of proof changes.
Current perspective: In contemporary artificial intelligence, this notion is fundamental. Modern models do not emit dry binary labels, but probability distributions (for example, «85% Cat, 15% Dog»). In critical areas such as law, medicine, or public administration, cutting-edge systems incorporate abstention thresholds: if confidence falls below a certain level, the algorithm is programmed to delegate the decision to a human. This capacity to «know that one does not know» is today the basis of algorithmic due process and the explainability that society demands from machines.
4. Implications and Scope
The emulator presented has a deliberately limited scope: it illustrates the foundational principle of machine learning, but does not solve real-world problems. This limitation is not a flaw in the implementation, but a mathematical property of the single-layer perceptron itself, demonstrated by Minsky and Papert in 1969.
1 Linear separability: the perceptron can only learn categories that can be separated by a straight line (or a hyperplane, in higher dimensions). The XOR problem —demonstrating that two inputs are different— is impossible for it, as Salvini detailed in his article. In social terms, it is like trying to classify people with two traits that do not admit a simple boundary.
2 Novikoff's convergence theorem (1962): if the data are linearly separable, Rosenblatt's rule guarantees that the machine will find a correct solution in a finite number of adjustments. This mathematical elegance is what justified Rosenblatt's original enthusiasm.
3 Computational resources: the emulator performs nine multiplications and sums per prediction. It runs without issues on fifteen-year-old computers, low-resource netbooks, or old mobile devices, fulfilling the accessibility objective set out.
From an interdisciplinary perspective, Rosenblatt's perceptron raises questions that go beyond computer science: what does it mean to «learn»? Can a machine have «beliefs» (its weights) that are modified by experience? These questions, posed in 1958 in a psychology journal, remain relevant today and are of particular interest to the social sciences and law, given the advance of algorithmic systems that make decisions with an impact on people.
5. Future Perspectives
The simple perceptron did not disappear: it evolved. Overcoming its limitations gave rise to the multilayer perceptron (MLP) and, eventually, to the architectures that dominate contemporary artificial intelligence today. Understanding the original perceptron is, in Salvini's words, «understanding the alphabet of deep learning».
- Projection 1 — Pedagogical value: emulators like the one presented here can be incorporated into digital literacy programs aimed at law, sociology, and humanities professionals, so that they understand the foundations of the systems they regulate.
- Projection 2 — Didactic extension: the code can be expanded to include additional categories, higher-resolution grids, or the visualization of the «AI winter» provoked by the criticisms of Minsky and Papert.
- Projection 3 — Ethical reflection: the transparency of the perceptron (whose weights are interpretable) contrasts with the opacity of contemporary models. This difference is central to current debates on algorithmic explainability and due process in automated decisions.
6. The XOR Problem: when a single opinion is not enough
For the non-technical reader: this section presents the problem that brought Rosenblatt's perceptron to crisis. It is explained with an everyday analogy before showing the mathematics. If you already know the XOR problem, you can go directly to section 7.
6.1. A judicial analogy
Let us imagine a judge who must decide whether a defendant is guilty or innocent. They have two testimonies available:
- Witness A: «I saw the defendant at the crime scene» (yes / no).
- Witness B: «The defendant has a verified alibi» (yes / no).
The reasonable rule is: the defendant is guilty only if exactly one of the testimonies is true. If both tell the truth (saw the scene and has an alibi), there is a contradiction and one cannot convict. If both lie (did not see them and there is no alibi), there is also not enough evidence. Only when one affirms and the other denies, does the scale tip.
This rule —which in logic is called XOR (exclusive or)— seems easy to state. But the 1958 perceptron is incapable of learning it, no matter how many examples we give it. Minsky and Papert demonstrated this mathematically in 1969.
6.2. Why can't the simple perceptron do it?
Let us recall what the perceptron does: it takes the inputs, multiplies them by their weights, adds everything up, and compares it with a threshold. This is equivalent to drawing a straight line on a plane that separates the positive cases from the negative ones.
In the XOR problem, the four possible cases are:
- (0, 0) → 0 (innocent)
- (0, 1) → 1 (guilty)
- (1, 0) → 1 (guilty)
- (1, 1) → 0 (innocent)
If we plot these four points on a plane, we will see that the «guilty» cases are in opposite corners, and the «innocent» ones in the other two opposite corners. There is no straight line that separates the guilty from the innocent. Any line drawn will leave at least one point on the wrong side. That is why the simple perceptron, which can only draw straight lines, fails.
6.3. The comparison emulator
Below, a small interactive emulator allows you to verify this. Check boxes A and B (1 = true, 0 = false) and press «Test XOR». The numbers you will see (20, -20, -10, etc.) are the weights and biases that the network learned during prior training. Hover over any number to see what it represents. You will see how the simple perceptron always fails, while the multilayer perceptron —which we will see in detail in the next section— gets it right.
⚡ Comparator: Simple Perceptron vs. Multilayer Perceptron on XOR
Check boxes A and B. The expected result is 1 only if exactly one is true.
A single neuron, a single opinion
Result: --
It cannot draw the correct boundary.A team of coordinated specialists
Result: --
Hidden layer with 2 neurons.The solution to the XOR problem requires introducing intermediate neurons —a «hidden layer»— that transform the input space into one where it is possible to draw a separating line. It is like going from a solitary judge to a team of specialists: each specialist evaluates a different aspect of the case, and a coordinator combines their opinions to reach the final answer. This is exactly what the Multilayer Perceptron does, which we will see next.
7. The Multilayer Perceptron: a team of specialists
For the non-technical reader: this section presents a visual emulator of the Multilayer Perceptron (MLP), the natural evolution of Rosenblatt's perceptron. You will see the complete neural network: the input neurons, two specialist neurons (hidden layer), and the output neuron that combines their opinions, with their connections and step-by-step calculations.
7.1. The architecture: from a solitary judge to a team of specialists
The Multilayer Perceptron introduces a simple but powerful innovation: intermediate neurons between the input and the output. Think of them as a group of specialists who analyze the information before a final decision is made:
- Input layer: the raw data (in our case, A and B).
- Hidden layer: two «specialists» who process the information. Each one examines the data from a different perspective and issues their own opinion.
- Output layer: the «coordinator» who combines the opinions and produces the final answer.
Each neuron —whether input, hidden, or output— does exactly the same thing as the simple perceptron: multiplies its inputs by their weights, adds everything up, and applies an activation function. The difference is that now there are several neurons working as a team, each specialized in a different aspect of the problem.
7.2. The MLP visual emulator
Below, an interactive emulator shows the complete neural network. Check boxes A and B, press «Calculate», and observe how the signal travels from the input, passes through the two specialists in the hidden layer, and reaches the coordinator. The numbers that appear on each neuron are their activations: what that neuron «thinks» after processing the information it receives. Thicker connections indicate stronger weights; green ones are positive weights (evidence in favor), red ones are negative weights (evidence against).
🏛️ The Multilayer Perceptron (MLP) — A team of specialists
Check A and/or B, and observe how the signal travels through the network. The weights are pre-trained to solve XOR. Hover over the numbers to see what they represent.
Input
Hidden Layer (Specialists)
Output (Coordinator)
Instructions: Check boxes A and/or B above, then press «Calculate step by step». You will see how each neuron processes the information and how the team of specialists reaches the final answer.
7.3. What does each specialist learn?
If you experiment with the emulator, you will notice something fascinating: each neuron in the hidden layer specializes in a different aspect of the problem. In the XOR case with the emulator's pre-trained weights:
- Specialist 1 activates only if A=1 and B=0. It detects the case «A is true, B is false».
- Specialist 2 activates only if A=0 and B=1. It detects the case «B is true, A is false».
The coordinator combines both opinions: if any specialist activated (but not both), it emits «1» (XOR = 1); if both activated or none activated, it emits «0» (XOR = 0). The combination of specialized opinions solves what a single neuron could not. This is the fundamental idea of deep learning: many layers of simple transformations, combined, solve complex problems.
7.4. The activation functions: ReLU and sigmoid
In the emulator, the neurons of the hidden layer use a function called ReLU (from English Rectified Linear Unit): if the sum is negative, the neuron «turns off» (is worth 0); if it is positive, the neuron «lets the value through» as is. It is like a specialist who only issues an opinion when they have evidence in favor; if they do not have it, they remain inactive.
The output neuron uses a function called sigmoid: it compresses any number into the range between 0 and 1, interpreted as a probability or degree of certainty. It is like a coordinator who does not issue a dry binary verdict, but a degree of certainty: «I am 92% sure the answer is 1». This capacity to express degrees of certainty is what differentiates the contemporary MLP from the rigid perceptron of 1958.
7.5. How are these weights trained?
In the emulator, the MLP weights are pre-trained: they are already adjusted to solve XOR correctly. In practice, these weights are obtained through an algorithm called backpropagation, which generalizes Rosenblatt's rule to networks with multiple layers. The idea is similar: if the network gets it wrong, the error «propagates backwards» from the output to the input, adjusting all the weights in the direction that reduces the error. But this algorithm, discovered only in 1986 by Rumelhart, Hinton, and Williams, took almost three decades to become popular —hence the «AI winter» between 1970 and 1980—.
📚 Further reading
- Gustavo Salvini, «Fundamentos IA: Qué es el perceptrón» (2026) — Complementary technical article (in Spanish)
- Rosenblatt, F. (1958). «The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain». Psychological Review, 65(6), 386–408.
- Minsky, M. & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
- Novikoff, A. B. J. (1962). «On convergence proofs on perceptrons». Proceedings of the Symposium on the Mathematical Theory of Automata.
- Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). «Learning representations by back-propagating errors». Nature, 323, 533–536.
- «More Consideration for the Perceptron» — Contemporary review on arXiv (2024).
Commitment to excellence.