What Makes AI Good? Not the Model — the Evidence Chain
By Igor Lima
Why enterprise AI trust depends on context, source control, permissions, evidence and human oversight — not just model intelligence.
1. The AI trust question is changing
Most conversations about AI trust still focus on hallucinations.
That matters, but in operational environments the question is becoming more specific.
It is no longer just:
Can the AI answer the question?
It is:
How do we know the AI used the right context, from the right internal source, with the right permissions, the right version and the right details?
Author's Perspective
Good AI is not the AI that sounds confident. Good AI is the AI that can prove what it knows.
In most organisations, information already exists somewhere:
- SharePoint
- Teams
- Jira
- ServiceNow
- Salesforce
- Confluence
- PDFs
- Knowledge articles
- Customer records
The challenge is not simply connecting AI to those systems.
The challenge is proving that the AI retrieved the right information, understood it correctly and stayed within the boundaries of what it was allowed to use.
2. A good model is not enough
Many AI conversations still focus on the model.
Which model is smarter?
Which model reasons better?
Which model has the biggest context window?
These things matter, but they are not enough.
In a business environment, even a strong model can give a poor answer if the context around it is wrong.
Research note
NIST describes trustworthy AI through characteristics such as reliability, safety, security, accountability, transparency, explainability, privacy and fairness.
A business AI agent is not just a model.
It is a chain of decisions:
- What sources were searched?
- What documents were retrieved?
- Which version was used?
- Which claims were generated?
- Which tool was called?
- Which action was approved?
If any part of that chain fails, the final answer may still sound professional — but it may not be trustworthy.
Author's Perspective
The problem is not only hallucination. The problem is context failure.
3. More context does not automatically mean better AI
Retrieval-Augmented Generation, or RAG, is often presented as the answer to hallucination.
Instead of asking the model to rely only on what it already knows, we give it documents, records or data and ask it to answer from that evidence.
That is powerful.
But it does not automatically make the answer correct.
Research note
Microsoft describes RAG as a pattern for grounding AI responses in organisational content by retrieving relevant information and bringing it into the model's context.
RAG helps, but it creates a new question:
Was the retrieved context actually the right context?
The right context can improve an answer.
The wrong context can make a good model worse.
That matters when AI is reading old incident updates, customer emails, engineering notes, policy documents or meeting summaries.
Some of that information may be outdated, informal, incomplete, contradictory or not approved for customer communication.
Author's Perspective
A grounded AI answer is only as good as the evidence it was grounded in.
4. The evidence chain
When an AI assistant produces an answer, I want to know more than the final response.
I want to know the evidence trail.
For example, if an AI assistant says:
“This customer has three open high-priority incidents, renewal is due in September and the account is at risk.”
The business should be able to inspect:
- Where the incident count came from
- Whether the incidents are still open
- Which system defined the priority
- Where the renewal date came from
- Whether the CRM record is current
- Whether the account risk came from data or assumption
- Whether the user was allowed to access the information
This is the difference between a useful AI answer and a governable AI workflow.
Research note
Google Cloud's grounding check can return support scores and citations showing how much an answer candidate agrees with a given set of facts.
Citations are helpful.
But citations alone are not enough.
Author's Perspective
A citation tells us where an answer might have come from. An evidence chain tells us whether the answer can be trusted.
5. The Context Trust Chain
To me, a good enterprise AI agent needs a Context Trust Chain.
This means checking six things.
Source trust
Which systems are authoritative?
CRM may be the source of truth for renewal dates. ServiceNow or Jira may be the source of truth for incidents. SharePoint may be the source of truth for policies.
If two sources conflict, the AI should not simply choose the one that sounds most relevant.
Ingestion trust
Was the information captured correctly?
If documents are badly scanned, poorly chunked, missing metadata or indexed without version control, the AI may retrieve incomplete or misleading information.
The model may look like the problem, but the real failure may be the data pipeline behind it.
Permission trust
Was the user allowed to access the information?
An AI assistant connected to internal systems should not retrieve everything the technology can technically see.
It should retrieve only what the user is allowed to access.
Research note
Microsoft Azure AI Search supports document-level access control so organisations can enforce fine-grained permissions from ingestion through query execution in RAG, enterprise search and agentic AI systems.
Author's Perspective
If the AI can retrieve information the user should not see, the AI is not good — even if the answer is accurate.
Retrieval trust
Did the AI retrieve the right context?
It may retrieve the right document but the wrong section.
It may use an old customer update instead of the latest incident status.
It may treat an informal engineering note as an approved customer message.
Research note
RAGAS includes metrics such as context precision, context recall and faithfulness to evaluate retrieval quality and answer grounding.
A good AI system needs to evaluate retrieval quality separately from answer quality.
Answer trust
Did the AI use the context correctly?
Even with the right sources, the final answer can still introduce unsupported claims.
A good AI should preserve names, dates, numbers and status correctly. It should highlight uncertainty and avoid filling gaps with invented detail.
Tool trust
Should the AI be allowed to act?
This becomes critical with AI agents.
A chatbot that answers a question is one thing.
An agent that can update a CRM record, send an email, create a Jira ticket or trigger a workflow introduces a different level of risk.
Author's Perspective
The question is not only whether the AI answered correctly. It is whether the AI was allowed to act in that context.
6. Agentic AI needs operational control
AI agents are powerful because they can use tools.
They can search systems, read files, update records, send messages, create tickets or call APIs.
That can remove a lot of friction.
It can also create risk.
Research note
OWASP identifies prompt injection as a major LLM application risk, where crafted inputs can manipulate model behaviour and potentially lead to unauthorised access, data exposure or compromised decision-making.
This matters because retrieved content is not always neutral.
An email, document or ticket comment could contain instructions the AI should not follow.
A good AI agent must treat retrieved content as evidence, not as instructions.
Author's Perspective
When AI agents connect to internal systems, the trust problem moves from answer quality to operational control.
7. What this means in operational environments
In customer support, service management and operational teams, the use cases are obvious.
AI can help with:
- Summarising long incident histories
- Preparing customer updates
- Finding approved workarounds
- Identifying missing information
- Classifying urgency
- Routing issues to the right team
- Preparing account briefings
- Drafting follow-up emails
These are valuable use cases.
But they only become good AI workflows if the system can show the evidence behind the answer.
For example, before sending an AI-generated customer update, the user should know:
- Which incident record was used
- Whether the status is current
- Whether the workaround is approved
- Whether the root cause is confirmed
- Whether sensitive internal notes were excluded
- Whether human approval is required
This is where AI becomes genuinely useful.
Not because it replaces the service manager.
Because it reduces the friction involved in gathering context, validating information and preparing a decision.
Author's Perspective
The most successful AI workflows do not remove people. They make the evidence easier to inspect.
Before deploying agents, organisations need to answer:
- What workflow are we improving?
- What outcome are we measuring?
- Which source is authoritative?
- What data can the AI access?
- What actions can it take?
- What needs approval?
- How do we audit the decision?
Author's Perspective
Most organisations do not have an AI model problem. They have an evidence, workflow and governance problem.
9. A practical prototype
To explore this idea, I have been thinking about a practical prototype around escalation management.
The objective would not be to replace the person handling the escalation.
The objective would be to help them inspect the evidence faster.
The prototype would ask a simple question:
What if an AI assistant could draft a recommendation and also show the evidence chain behind every important claim?
The AI would gather information from synthetic internal sources such as CRM records, Jira tickets, incident timelines, knowledge articles, customer updates and policy documents.
It would then produce two outputs.
The recommendation
This would include:
- Situation summary
- Customer impact
- Suggested customer update
- Recommended next action
- Missing information
- Risk level
- Human approval requirement
The evidence chain
This would show:
- Sources searched
- Sources retrieved
- Source type
- Last updated date
- Permission status
- Claims supported by each source
- Unsupported claims
- Conflicting information
- Sensitive information excluded
The purpose would be simple.
Not just to show that AI can generate an answer.
But to show whether the answer can survive inspection.
Conclusion
The question of what makes AI good is becoming more important.
A good AI is not simply the most advanced model, the most polished chatbot or the fastest agent.
A good AI is one that can show where its answer came from, respect permissions, identify uncertainty, stay within safe boundaries and leave a clear trail for human review.
For enterprise AI, trust will not come from confidence alone.
It will come from evidence.
Author's Perspective
Good AI is not the AI that sounds right. Good AI is the AI that can prove what it knows, show what it does not know and stay within the boundaries of what it is allowed to do.
Want to explore a similar workflow?
Try the AI demos or explore how practical AI workflows can support operations, knowledge retrieval and human-in-the-loop decision making.