Faith in the Machine: Do we want models that work or ones we understand?
The AI systems shaping our lives are getting harder to question.
Two weeks ago, OpenAI said it may release models even if they’re considered “critical risk”, especially if competitors have launched something similar.
This marks a sharp departure from its earlier promise not to release anything above a "medium risk" threshold, and it reflects a broader trend across the AI industry: quietly redefining what counts as "frontier" technology in order to bypass prior safety commitments.
This matters because the systems shaping our lives, from healthcare to finance to education, while becoming more powerful, are also becoming less accountable. Deep learning is often described as a kind of alchemy. It’s impressive, powerful, and not entirely understood, even by those who build it. It’s not magic, but it isn’t always science in the explanatory sense, either. It’s a form of epistemic compromise: we gave up transparency to gain performance.
It’s a bit like a magic eight ball. You shake it, you get an answer, maybe even the right one, but you have no real way of interrogating where it came from, or why you should trust it. But unlike a magic eight ball, the danger with AI isn’t that it makes mistakes. It’s that it makes mistakes persuasively.
So why does this matter?
I was chatting to a doctor friend in the US last week, she was telling me about the AI medical diagnostics tool her practice has rolled out. The AI has been trained on millions of cases and has led to measurably better outcomes for patients.
Suggested treatment protocols just show up in your medical file. And sometimes the treatments are…unusual. My friend has more than once been a bit perplexed by its recommended protocol, it’s not what she would have chosen.
Where it really gets dicey is when patients ask her why this approach was chosen over conventional protocol.
She can’t tell them. Neither can the developers. The system is accurate, often more so than the humans using it. But its logic is obscured.
This gap between accuracy and explainability is one of the tensions at the heart of modern AI.
We are building systems that outperform us in domains where performance matters most: healthcare, education, finance, law. Systems that can predict, classify, diagnose, optimise, often with astonishing precision, but when asked to explain themselves, they can’t.
Which leaves us with a harder question:
Is it enough for an AI system to be right, if we have no idea why it’s right?
What we’re actually trading off
At the core of the explainability debate is a simple, persistent tension: the more accurate a model becomes, the harder it is to understand.
In machine learning, accuracy refers to how well a model performs its task i.e. a diagnostic model that correctly identifies cancer 95% of the time is more accurate than one that gets it right 87% of the time. Explainability, on the other hand, is about whether we can understand how the model reached its conclusion.
Can we trace the logic?
Can we identify the factors it relied on?
Can we review the decision and see that it makes sense?
Here’s the trade-off in practice: a deep neural network, trained on massive amounts of medical imaging data, might outperform human radiologists in detecting early signs of disease. But ask it why it flagged a specific scan, and it can’t give a meaningful answer. Not one we can easily interpret, anyway. This is why we call neural networks black boxes. Contrast this with a decision tree, a model that mimics step-by-step human reasoning. Its accuracy might be lower, but its logic is transparent. You can follow its trail: “If symptom A is present, and test B is above threshold, then outcome C.”
It’s all well and good when you fall in the 97% where things are accurate, but what if you’re in the 3% and you have no way of checking the logic.
Now if you’ve used ChatGPT’s premium subscription, you might have seen that it attempts to walk you through its steps. But here’s the thing: we have no idea if this is actually accurate. ChatGPT is what, Nell Watson, author of Taming the Machine would call a ‘stochastic parrot’ capable of generating post-hoc explanations, but we’ve no means of verifying whether it is the true logic followed or whether it’s simply generating something that appears plausible.
Now tools like SHAP and LIME can attempts to reverse-engineer explanations for complex models. These tools highlight which input features contributed most to a given prediction. Others try to extract simplified rules or create counterfactuals and show what would have changed the outcome. But they’re still just approximations.
Why does this matter?
Because the higher the stakes the more important it is that we know how a conclusions was reached so we can challenge it, audit it, or even just trust it.
Take the case of COMPAS.
COMPAS was an algorithm used in the U.S. criminal justice system to predict a defendant’s risk of reoffending. Judges have used its scores to inform bail, sentencing, and parole decisions. On paper, the system was designed to improve objectivity and consistency. But defendants and their lawyers often had no way to understand or contest the score they received. The model was privately owned, so the internal logic was proprietary. And in 2016, investigative reporting revealed that COMPAS systematically assigned higher risk scores to Black defendants than to white defendants with similar criminal histories.
What makes this example so striking isn’t just the bias, it’s the opacity and the inability to challenge or dispute a decision. Even if COMPAS had been more accurate than a judge’s intuition (which is debatable), it didn’t allow for scrutiny. There was no mechanism to ask: “Which variables did it use? What mattered more, my prior convictions or my zip code? How do I appeal this?”
When we rely on systems we can’t explain, we lose more than transparency, we lose accountability. And when those systems are wrong, harmful, or biased, there’s no clear path for redress.
In domains like cancer detection or protein folding, accuracy can literally save lives. We chose systems that perform better because, in many contexts, being right matters more than being understandable.
But now those systems are being used not just to detect tumors or translate sentences: they’re being used to shape credit decisions, job screenings, educational assessments, and risk evaluations. And in those spaces, the cost of opacity is harder to justify.
Why it matters (even if you’re not in a "high-stakes" job)
It’s tempting to think of explainability as a problem for doctors, judges, or financial regulators, etc. But the impact of black-box AI isn’t confined to high-stakes environments.
If you're a job seeker, there's a good chance your résumé is being screened by an algorithm. If you're a student, you might already be graded by software that claims to assess everything from your grammar to the “coherence” of your argument. And, if you're a content creator, analyst, or knowledge worker, you're probably using AI tools to draft emails, generate summaries, write articles, or “fact-check” your work.
But here’s the catch (and something woefully few people recognise): GenAI generates text based on probability.
So while something might sound convincing, whether it’s true or not is an entirely different story. And the more fluent the system becomes, the harder it is to tell the difference between knowledge and misplaced confidence.
Not all AI is created equal
So when I said neural networks are more accurate, that typically applies to specialised AI, not tools like ChatGPT or Claude. So if you’re thinking well, I’m happy with 97% accuracy, that doesn’t apply to your run-of-the-mill ChatGPT usage.
A while ago, I asked ChatGPT to count the number of ‘R’s in ‘strawberry’. This is what it said:
Now, obviously, there are three Rs in strawberries, but ChatGPT isn’t counting letter by letter. The model is basing its answer on the results of its training data. This loophole became widely known, so I repeated this test this morning.
This time, it got it right.
Now, is it correct because it’s counting it correctly? Or have enough people told it that “strawberry” has three Rs?
So I repeated the experiment with a random string: “thlkdjfrrwldfgrlkdgrrppp”
This time it counted the number of Rs correctly, but somehow still managed to miscalculate.
All this to show that when it comes to non-specialised GenAI, what we’re seeing is neither accuracy nor explainability, it’s the appearance of it.
The false comfort of AI fluency
So to outline this a little further, I asked ChatGPT a question: “Why is inflation bad for the economy?”
A couple of seconds later, it spits out:
“Inflation erodes purchasing power, meaning consumers can buy less with the same amount of money. It also creates uncertainty, which can discourage investment and long-term planning by businesses and individuals.”
Sound about right, yeah?
Nope.
Most economists would say a degree of inflation (around 2%) is actually desirable. It encourages spending and investment, and helps prevent deflationary spirals. Central banks don’t aim for zero inflation.
So what that means is that ChatGPT very confidently asserted something that’s straight up false. (Don’t @ me about it being a leading question. If we’re putting our faith in something it should be able to push back)
But here’s the thing, because the language was fluent, and the logic was familiar, and the tone was confident. We’re more likely to mistake plausibility for truth, nod, copy it into our notes, and maybe drop it into a slide deck.
And unless you were already an expert, you probably didn’t know enough to question it.
So when OpenAI removed “persuasion and manipulation” from its list of critical risks, this issue is likely to get a lot worse. Once considered a major concern tied to election tampering, misinformation spread, and propaganda, this has now been lumped into post-release monitoring and terms of service enforcement.
Even when AI systems aren’t embedded in life-or-death decisions, they shape how we think, write, and understand the world.
Language models like GPT are trained to predict the next most likely word in a sequence, based on vast amounts of text. They generate responses that are statistically coherent, this doesn’t mean they’re logically grounded. And yet, because they do this so fluently, they give off the strong impression of understanding.
This is once again the "stochastic parrot" problem we talked about earlier: the model doesn't know what it's saying, all it’s really doing is repeating and remixing patterns from its training data in ways that feel meaningful. And that fluency, ironically, makes it harder to notice when something’s off.
The issue isn’t just factual error. It’s that the model gives you an answer that sounds plausible, authoritative, and correct, even when it isn’t. You don’t get a flashing warning sign when something is off. Instead, you get a very persuasive answer, that it thinks you want to hear. And unless you already know better, you’re likely to believe it.
So when you factor in the growing dangers of cognitive off-loading (I talk about that more here) along with growing AI fluency, what we’re seeing is decreasing resilience to misinformation.
Is AI making us stupid?
For the past two years, I’ve used generative AI nearly every day. It’s efficient, versatile, and frankly, addictive. Yet, a creeping fear has taken hold: Is this making me intellectually lazy? What began as a tool for augmenting my work is starting to feel like an intellectual crutch, and for someone whose greatest fear is losing mental acuity, this is …
Whether you’re in healthcare, hiring, education, policy, everyday knowledge work, AI is becoming harder to ignore: faster decisions, broader insights, improved outcomes.
The questions I’m left with are whether we should value systems that give us the right answer, even if we can’t question them? Or is the ability to ask why fundamental to any decision we’re expected to accept?
What to do with this information?
Am I saying that all AI is bad and there’s no point in using it? No, the fact is, even if you’re using non-specialised tools like Claude and GenAI, they do have access to a wealth of knowledge, and are incredibly good at helping you organise your thoughts.
So the solution we’re really looking at is a human-in-the-loop approach.
This means starting with your own ideas. It’s a bit like writing; it’s easier to edit than to write a draft. But at least for me personally, struggling to find the right angle or express an idea is often how I find something worth writing about.
That doesn’t mean I don’t use AI. My favourite way of doing this is using voice notes to brain dump my ideas, and then getting ChatGPT to organise them. If you’re a copywriter reading this, then it’s a little like treating it like your junior writer, you outline, lay out the strategy and get it to create a rough draft that you then fact-check, read line by line, as opposed to passively outsourcing the work.
Critical thinking has never been more important. If AI can neither guarantee outcomes nor offer true explainability, the onus to interrogate outcomes falls on us. Else we risk losing both transparency and agency.
Wrapping up
It’s not all doom and gloom. What AI can do is extraordinary. It has already saved lives, and the impact on productivity is undeniable.
And for the longest time, I considered myself a tech optimist, as an entrepreneur, it’s part of the territory. However, the rampant use of tools like ChatGPT by people who don’t understand what’s happening under the hood? Stir in the erosion of safeguards and that leaves deeply concerned.
Humans have a long history of putting faith in things we don’t fully understand, religion, astrology, Buzzfeed quizzes that tell you what kind of bread you are.
And that’s the risk with AI. It becomes the next magic eight ball.
Only this time, it knows exactly what to say to convince you.
Explainability matters not because any one of these decisions will ruin your life, but because taken together, they create an environment where you can be sorted, ranked, and evaluated without context, recourse, or clarity. And the more we normalise this kind of invisible decision-making, the harder it becomes to demand something better.







