1. The AI trust question is changing

Most conversations about AI trust still focus on hallucinations.

That matters, but in operational environments the question is becoming more specific.

It is no longer just:

Can the AI answer the question?

It is:

How do we know the AI used the right context, from the right internal source, with the right permissions, the right version and the right details?

Author's Perspective

Good AI is not the AI that sounds confident. Good AI is the AI that can prove what it knows.

In most organisations, information already exists somewhere:

SharePoint
Teams
Jira
ServiceNow
Salesforce
Confluence
Email
PDFs
Knowledge articles
Customer records

The challenge is not simply connecting AI to those systems.

The challenge is proving that the AI retrieved the right information, understood it correctly and stayed within the boundaries of what it was allowed to use.

2. A good model is not enough

Many AI conversations still focus on the model.

Which model is smarter?

Which model reasons better?

Which model has the biggest context window?

These things matter, but they are not enough.

In a business environment, even a strong model can give a poor answer if the context around it is wrong.

Research note

NIST describes trustworthy AI through characteristics such as reliability, safety, security, accountability, transparency, explainability, privacy and fairness.

Source: NIST AI Risk Management Framework

A business AI agent is not just a model.

It is a chain of decisions:

What sources were searched?
What documents were retrieved?
Which version was used?
Which claims were generated?
Which tool was called?
Which action was approved?

If any part of that chain fails, the final answer may still sound professional — but it may not be trustworthy.

Author's Perspective

The problem is not only hallucination. The problem is context failure.

3. More context does not automatically mean better AI

Retrieval-Augmented Generation, or RAG, is often presented as the answer to hallucination.

Instead of asking the model to rely only on what it already knows, we give it documents, records or data and ask it to answer from that evidence.

That is powerful.

But it does not automatically make the answer correct.

Research note

Microsoft describes RAG as a pattern for grounding AI responses in organisational content by retrieving relevant information and bringing it into the model's context.

Source: Microsoft Azure AI Search

RAG helps, but it creates a new question:

Was the retrieved context actually the right context?

The right context can improve an answer.

The wrong context can make a good model worse.

That matters when AI is reading old incident updates, customer emails, engineering notes, policy documents or meeting summaries.

Some of that information may be outdated, informal, incomplete, contradictory or not approved for customer communication.

Author's Perspective

A grounded AI answer is only as good as the evidence it was grounded in.

4. The evidence chain

When an AI assistant produces an answer, I want to know more than the final response.

I want to know the evidence trail.

For example, if an AI assistant says:

“This customer has three open high-priority incidents, renewal is due in September and the account is at risk.”

The business should be able to inspect:

Where the incident count came from
Whether the incidents are still open
Which system defined the priority
Where the renewal date came from
Whether the CRM record is current
Whether the account risk came from data or assumption
Whether the user was allowed to access the information

This is the difference between a useful AI answer and a governable AI workflow.

Research note

Google Cloud's grounding check can return support scores and citations showing how much an answer candidate agrees with a given set of facts.

Source: Google Cloud Grounding Check

Citations are helpful.

But citations alone are not enough.

Author's Perspective

A citation tells us where an answer might have come from. An evidence chain tells us whether the answer can be trusted.

5. The Context Trust Chain

To me, a good enterprise AI agent needs a Context Trust Chain.

This means checking six things.

Source trust

Which systems are authoritative?

CRM may be the source of truth for renewal dates. ServiceNow or Jira may be the source of truth for incidents. SharePoint may be the source of truth for policies.

If two sources conflict, the AI should not simply choose the one that sounds most relevant.

Ingestion trust

Was the information captured correctly?

If documents are badly scanned, poorly chunked, missing metadata or indexed without version control, the AI may retrieve incomplete or misleading information.

The model may look like the problem, but the real failure may be the data pipeline behind it.

Permission trust

Was the user allowed to access the information?

An AI assistant connected to internal systems should not retrieve everything the technology can technically see.

It should retrieve only what the user is allowed to access.

Research note

Microsoft Azure AI Search supports document-level access control so organisations can enforce fine-grained permissions from ingestion through query execution in RAG, enterprise search and agentic AI systems.

Source: Microsoft Azure AI Search

Author's Perspective

If the AI can retrieve information the user should not see, the AI is not good — even if the answer is accurate.

Retrieval trust

Did the AI retrieve the right context?

It may retrieve the right document but the wrong section.

It may use an old customer update instead of the latest incident status.

It may treat an informal engineering note as an approved customer message.

Research note

RAGAS includes metrics such as context precision, context recall and faithfulness to evaluate retrieval quality and answer grounding.

Source: RAGAS Evaluation Metrics

A good AI system needs to evaluate retrieval quality separately from answer quality.

Answer trust

Did the AI use the context correctly?

Even with the right sources, the final answer can still introduce unsupported claims.

A good AI should preserve names, dates, numbers and status correctly. It should highlight uncertainty and avoid filling gaps with invented detail.

Tool trust

Should the AI be allowed to act?

This becomes critical with AI agents.

A chatbot that answers a question is one thing.

An agent that can update a CRM record, send an email, create a Jira ticket or trigger a workflow introduces a different level of risk.

Author's Perspective

The question is not only whether the AI answered correctly. It is whether the AI was allowed to act in that context.

6. Agentic AI needs operational control

AI agents are powerful because they can use tools.

They can search systems, read files, update records, send messages, create tickets or call APIs.

That can remove a lot of friction.

It can also create risk.

Research note

OWASP identifies prompt injection as a major LLM application risk, where crafted inputs can manipulate model behaviour and potentially lead to unauthorised access, data exposure or compromised decision-making.

Source: OWASP Top 10 for Large Language Model Applications

This matters because retrieved content is not always neutral.

An email, document or ticket comment could contain instructions the AI should not follow.

A good AI agent must treat retrieved content as evidence, not as instructions.

Author's Perspective

When AI agents connect to internal systems, the trust problem moves from answer quality to operational control.

7. What this means in operational environments

In customer support, service management and operational teams, the use cases are obvious.

AI can help with:

Summarising long incident histories
Preparing customer updates
Finding approved workarounds
Identifying missing information
Classifying urgency
Routing issues to the right team
Preparing account briefings
Drafting follow-up emails

These are valuable use cases.

But they only become good AI workflows if the system can show the evidence behind the answer.

For example, before sending an AI-generated customer update, the user should know:

Which incident record was used
Whether the status is current
Whether the workaround is approved
Whether the root cause is confirmed
Whether sensitive internal notes were excluded
Whether human approval is required

This is where AI becomes genuinely useful.

Not because it replaces the service manager.

Because it reduces the friction involved in gathering context, validating information and preparing a decision.

Author's Perspective

The most successful AI workflows do not remove people. They make the evidence easier to inspect.

Before deploying agents, organisations need to answer:

What workflow are we improving?
What outcome are we measuring?
Which source is authoritative?
What data can the AI access?
What actions can it take?
What needs approval?
How do we audit the decision?

Author's Perspective

Most organisations do not have an AI model problem. They have an evidence, workflow and governance problem.

9. A practical prototype

To explore this idea, I have been thinking about a practical prototype around escalation management.

The objective would not be to replace the person handling the escalation.

The objective would be to help them inspect the evidence faster.

The prototype would ask a simple question:

What if an AI assistant could draft a recommendation and also show the evidence chain behind every important claim?

The AI would gather information from synthetic internal sources such as CRM records, Jira tickets, incident timelines, knowledge articles, customer updates and policy documents.

It would then produce two outputs.

The recommendation

This would include:

Situation summary
Customer impact
Suggested customer update
Recommended next action
Missing information
Risk level
Human approval requirement

The evidence chain

This would show:

Sources searched
Sources retrieved
Source type
Last updated date
Permission status
Claims supported by each source
Unsupported claims
Conflicting information
Sensitive information excluded

The purpose would be simple.

Not just to show that AI can generate an answer.

But to show whether the answer can survive inspection.

AI evidence chain inspector demo

Conclusion

The question of what makes AI good is becoming more important.

A good AI is not simply the most advanced model, the most polished chatbot or the fastest agent.

A good AI is one that can show where its answer came from, respect permissions, identify uncertainty, stay within safe boundaries and leave a clear trail for human review.

For enterprise AI, trust will not come from confidence alone.

It will come from evidence.

Author's Perspective

Good AI is not the AI that sounds right. Good AI is the AI that can prove what it knows, show what it does not know and stay within the boundaries of what it is allowed to do.

What Makes AI Good? Not the Model — the Evidence Chain

1. The AI trust question is changing

2. A good model is not enough

3. More context does not automatically mean better AI

4. The evidence chain

5. The Context Trust Chain

Source trust

Ingestion trust

Permission trust

Retrieval trust

Answer trust

Tool trust

6. Agentic AI needs operational control

7. What this means in operational environments

9. A practical prototype

The recommendation

The evidence chain

Conclusion

Want to explore a similar workflow?