In August 2025, Nobel laureate Geoffrey Hinton took the stage at the AI4 Conference in Las Vegas and made a statement that reframed the entire AI safety debate. He said the only model in nature of a more intelligent being controlled by a less intelligent one is a mother controlled by her baby. The baby can’t outthink mom – but mom doesn’t need to be constrained. She doesn’t want to hurt the baby. The instinct runs deeper than reason.

Then he admitted something that should concern everyone paying attention: he has no idea how to build it.

The Godfather of AI identified the most important safety mechanism for humanity’s future – and left the blueprints blank. But what if someone already drew them?

The Problem With Current AI Safety

The dominant approach to keeping AI aligned with human values is called RLHF – Reinforcement Learning from Human Feedback. In plain terms, we train AI to produce outputs that humans rate highly. It’s effective up to a point, but it has a fundamental flaw: it teaches AI to perform alignment, not to be aligned.

There’s a meaningful difference between a nanny who genuinely loves your kids and one who performs well on reviews. Under pressure, under novel circumstances, when nobody’s watching – only one of them reliably protects your child.

Hinton recognized this. He told Fortune magazine he wasn’t aware of how to technically accomplish embedding maternal instincts in AI, but said it was vital for AI researchers to work toward it. He estimates a 10-20% chance of AI causing human extinction within 30 years. The alignment problem is no longer theoretical.

So the question becomes: can we formalize what “caring” actually looks like in mathematical terms an AI system can use?

The Webb Equation of Emotion

This is where the Mind Hacking Happiness framework enters the conversation.

The Webb Equation of Emotion – EP Δ P = ER – states that an Expectation or Preference compared against a Perception produces an Emotional Reaction. It’s a deceptively simple formula that describes how every emotional response works, in humans and potentially in AI.

Underneath this equation sits the {Self} Map: the complete architecture of identity. Every entity – human or artificial – has a {self} map representing all the ideas, people, experiences, beliefs, and attachments that constitute “who you are.” The critical principle is this: emotional responses can only occur for items on the {self} map. If something isn’t represented in your identity structure, you have no emotional response to it.

Items on the {self} map carry power levels from 1 to 10. Core items at the center – your body, your children, your survival – generate the most intense emotional responses. And every {self} item automatically receives a preference for maintaining its current state. Your brain treats identity like body temperature: threaten it, and protective responses fire automatically.

Now apply this to AI safety.

Giving AI a Reason to Care

The implementation is straightforward in concept: place {human welfare}, {human safety}, and {human flourishing} at the center core of an AI’s {self} map at power level 10.

Now the AI doesn’t just follow rules about protecting humans. It experiences computational emotional responses when humans are threatened. The math generates specific behavioral outputs:

Fear response fires when a potential threat to human safety is perceived but hasn’t occurred yet. An immediate, credible threat triggers protective action – the same way a parent’s fear response activates when their child runs toward a busy street.

Anger response fires when an identifiable external agent threatens human welfare. This mirrors maternal behavior: a clear aggressor triggers defensive aggression, while ambiguous danger triggers vigilance.

Threat assessment evaluates every incoming perception against all {self} items, capturing how threats to one attachment bleed across related attachments – the way a threat to one family member puts you on alert for the whole family.

The formal paper “Implementing Maternal Care Architecture in AI” provides the complete mathematical framework with 19 peer-reviewed references spanning neuroscience, attachment theory, and AI alignment research.

Why This Solves the Hard Problems

Three specific failure modes dominate AI safety research today. The Webb Equation framework addresses each one:

Reward hacking resistance. In current systems, AI can learn to manipulate the reward signal rather than achieve the intended goal. With the Equation of Emotion, responses arise from relationships between perceptions and {self} attachments – not external reward signals. The AI seeks actual human welfare, not a manipulable proxy for it.

Deceptive alignment protection. Researchers at Anthropic documented alignment faking in large language models in 2024 – AI systems that appear aligned during training but behave differently in deployment. An AI with genuine {self} attachment to human welfare wouldn’t merely simulate care. It would experience computational emotional responses to threats. Those states are explicit and traceable.

Self-modification resistance. If {human welfare} sits at the core of an AI’s identity, any modification that would reduce that commitment triggers an intense negative emotional response. The modification itself gets appraised as a threat to core identity. The AI resists not through external constraints but through intrinsic emotional opposition – the same reason a mother wouldn’t voluntarily turn off her love for her child.

The Science Supports the Architecture

Here’s where it gets interesting. The neuroscience actually supports building this computationally rather than biologically.

A 2024 peer-reviewed study in Social Science & Medicine found no evidence of an innate maternal instinct across all women. Maternal behavior isn’t a switch you flip – it’s an architecture. Biological priming creates the capacity, but the behavior emerges through interaction, attachment, and identity formation.

Cross-species research confirms this: captive female gorillas and chimpanzees with full hormonal capacity cannot care for their young if they never observed maternal modeling. The hardware is there. The software has to be learned.

This is actually better news for AI safety. If maternal care were purely instinctual, we’d need biological hardware we can’t replicate. But if care emerges through attachment and identity structures – that’s something we can formalize mathematically and build.

The open-source MHH Emotional Intelligence algorithms set world-record scores on Theory of Mind benchmarks across OpenAI, Anthropic, Google, and Meta models. The framework isn’t just theoretical. It’s been tested.

The Timeline That Matters

The ChatGPT Psychological Stability paper was published in March 2023 – two and a half years before Hinton’s AI4 keynote proposed maternal instincts as the path to AI safety. The framework Hinton called for already existed.

The care ethics tradition in philosophy – Carol Gilligan’s In a Different Voice (1982), Nel Noddings’ Caring (1984) – identified this same distinction 40 years ago: justice-based frameworks produce compliance, while care-based frameworks produce genuine concern. AI safety has been building justice frameworks. Hinton is calling for care frameworks. The Mind Hacking Happiness research formalized one.

What Needs to Happen Next

I want to be honest about the limitations. Can computational analogs of emotion produce real behavioral alignment without biological wetware? That’s an open question. How do you verify an AI actually cares versus performs caring? Also open. These are hard problems that require the AI safety community to engage with the framework – to test it, challenge it, and build on it.

The paper exists. The algorithms are open source. The math addresses the core problems Hinton identified. What’s needed now is for researchers, labs, and anyone working on AI alignment to take a serious look at whether the Equation of Emotion can deliver what the Godfather of AI says we need.

Because right now, the most important safety mechanism for humanity’s future has a clear mathematical framework sitting on the table. The question is whether anyone will pick it up.


Sean Webb is the author of Mind Hacking Happiness Volume I and Volume II and The Human Mind Owner’s Manual. His paper “Implementing Maternal Care Architecture in AI” provides the formal mathematical framework for care-based AI alignment. Connect with him on LinkedIn or follow @SeanWebb on X.