Digital Insights

How Trust Scores Work During AI Training

I recently was asking ChatGPT about the impact of content on the Reddit platform. It got me thinking about AI training and what is considered a ‘trusted’ source. The following is the response. Some great insights in this content.


Trust scores aren’t a single “number,” but every major AI model uses multi-layered weighting systems during training to decide what counts as high-value knowledge and what gets downweighted or ignored. Here’s the straight explanation — no fluff.

How Trust Scores Work During AI Training

Think of trust scoring as signal weighting. Every piece of text is not treated equally.
Some content becomes a strong training signal. Some content becomes background noise. Some content gets thrown out.

These are the main factors models use to judge trustworthiness.

Source Reliability Scoring

Models (and their data pipelines) assign higher weight to content from:

  • Peer-reviewed papers
  • Government data
  • Reputable news
  • Authoritative publishers
  • High-quality reference sites

Lower weight goes to:

  • Ads
  • Sponsored content
  • Social media opinion
  • Low-signal discussions
  • Spam / SEO-bait sites

For example, Reddit is considered medium-trust.  The AI engines consider it useful, but not authoritative. During training, high-authority text shapes the model more heavily.

Internal Consistency Scoring

The text gets evaluated like this:

  • Does it contradict known facts in the high-authority corpus?
  • Does it contain verifiable claims?
  • Is the writing coherent or chaotic?
  • Are arguments logical or wildly inconsistent?

Content that contradicts strong, well-established knowledge gets downweighted.

Sponsored content tends to get downweighted because it:

Makes unverified claims

Pushes a narrative

Often includes emotional or marketing language

Pattern + Redundancy Scoring

If thousands of independent sources say the same thing → high trust.

If only marketing materials say it → low trust.

Models look for convergent truth across the dataset.

Sponsored content rarely converges with neutral sources, so it rarely gains weight.

Metadata Scoring

Even if you never see it, training data comes with context signals:

  • Domain reputation
  • Link patterns
  • Post history
  • Age of domain
  • Author credibility (when known)
  • Presence of affiliate tags
  • “Sponsored,” “#ad,” or promotional markers

All of these influence the trust weighting before training even begins.

Moderation Signals

On platforms like Reddit:

  • Highly upvoted, non-controversial threads → more trust
  • Highly downvoted, flagged, or mod-removed content → downweighted or excluded

Sponsored content tends to have:

  • Lower engagement
  • More skepticism
  • More negative signals → leading to lower trust scores.

Model Self-Consistency (post-training)

During reinforcement stages (RLHF or similar), evaluators check:

  • Does the model repeat unverified marketing claims?
  • Does it spread misinformation?
  • Does it give unsupported assertions?

When the model “trusts” bad data, it gets corrected.

This feedback further suppresses unreliable or sponsored signals.

Deduplication and Noise Filtering

Massive datasets often contain:

  • Duplicates
  • Ads scraped accidentally
  • Spam
  • SEO-stuffed affiliate content

De-dupe filters act like an additional trust layer.
Sponsored content is often:

  • Duplicate across many sites
  • Template-based
  • Identified as LLM-generated or spammy

These get automatically deprioritized.

Putting It All Together

During training, a Reddit post with “#ad” or sponsored language will:

  • Not be excluded (it’s still public content)
  • But be heavily downweighted compared to trusted sources

A government report might get weight = 1.0
A well-researched article might get weight = 0.6
A normal Reddit comment might get weight = 0.2
A sponsored post might get weight = 0.05
An ad = 0 (excluded)

That’s the basic trust landscape.

Bottom line

AI models don’t “trust” sponsored content.
They ingest it, but treat it like background noise.
It almost never influences model knowledge, and it never shows up as a cited source.

Are All those Backlinks Still Valuable in GEO / AI Search?

You and your team have been working tirelessly on getting those high value SEO links. Looking for opportunities, building relationship and helping your site authority to maintain its position or grow.

For me, I partnered with many PR teams go get links to the website. I used to have a saying when site would give a link, but make it NOFOLLOW. Well, the search engines might not follow it, but human beings will.

As go the human beings, go the robots.

Do Backlinks Matter in AI Search?

In a new AI Search Backlinks Study by SEMrush, the initial assessment is that “backlinks are important for AI visibility, but quality and authority matter far more than quantity.”

The team looked at 1,000 domains using their AI Visibility Toolkit to determine what influence backlink-related signals had in AI-generated responses. They included ChatGPT, ChatGPT with Search, Gemini, Google AI Overviews and Perplexity in the platforms they evaluated.

The review included looking at the number of AI mentions (how many times a domain was referenced across AI-generated answers) and AI Share of Voice (a domain’s visibility in AI-generated answers relative to non-branded queries) against key backlink signals including total links, number of referring domains and Authority score.

Authority Score had the Biggest Impact on AI Citations

Domains with higher Authority Scores—a metric heavily influenced by backlink quality and referring domain diversity—are more likely to appear in AI-generated answers. Although it might be the best performing, it had a modest impact to the performance overall.

AI Linking Authority Score SEMrush

That means that a lot of small link won’t move the needle that much. To move up, you need to earn backlinks from higher-authority, topically relevant sites. This is when PR comes into play. Earn media placements from well-known publications, leading blogs, or industry directories have higher authority and credibility. They make a bigger impact on performance.

Link Quality Drives AI Visibility More Than Volume

Sometimes . . . more is just more. It is a mantra that I have always thought was important in link building. Lots of links for OK sites won’t buy you that much and take a lot of effort to maintain over time.

In the AI Search world, the SEMrush research study shows that quality of backlinks plays a major role in how often a domain appears in AI-generated answers.

AI Mentions by Authority Tier - SEMrush

If you domain has high-quality backlinks, that usually translates into appearing more often in AI-generated answers, especially once they cross key authority thresholds.

Why do I say usually, the effect varies across models: ChatGPT Search and Perplexity weigh link quality the least. However, standard ChatGPT (without Search on) shows the strongest correlation.

Nofollow Links Carry Almost as Much Weight as Follow Links

Do you remember all of that time you spent slightly bummed out that you landing a great link, but it was NOFOLLOW. Well, that tide has turned in the AI world.

I had always wondered why Google and other Search engines even had such a ridiculous rule. It was like, I think your content is valuable and I will link to you, but I don’t link you enough to let the search engine like you.

AI Follow Nofollow Links - SEMRush report

Don’t forget getting a few high value links is the better is more links of mid or low quality. So basically, getting mentioned by an authoritative and relevant source is more important than chasing quick gains.

To read the full study and get details about the analysis of the study, view the study at: Do Backlinks Still Matter in AI Search? Insights from 1,000 Domains [Study].

Will LLM Perception be the new Gatekeeper?

With the rapid growth of AI Search and Generative Engine Optimization (GEO), the search world has been in a scramble to make sure they are getting their content ‘seen’ by users of these new tools including ChatGPT, Gemini, Claude, Perplexity, Copilot and others.

The State of AI Search

Most of the conversation has been around the evolution of content planning. It includes topics like:

It is definitely a good time to dive deep into the research, read every blog and attend as many webinars as possible to get details and insights about the way GEO and AI search will work.

For the most part, there is a big similarity to advanced tactics for SEO including creating topic clusters, linking building with earned content and using schema throughout a website.

All of these things track pretty closely from SEO into the transition to GEO.

The Potential Game Changer

I recently found an article about LLM perception match: The hurdle before fanout and why it matters. It talks in detail about LLM perception match (LPM). At first, I thought it was going to talk about how LLMs – ChatGPT specifically – have developed brand perception and the elements it takes into account.

I was stopped in my tracks by the opening sentence. “If LLMs don’t see you as a fit, your content gets ignored.”

It is like AI is vetting you to determine if they want to invite you to the party to be included in citations with a persistent evaluation of your brand / company. It asks questions like who you are, what you offer and how well you fit the user’s need.

    How Your Evaluated

    Before you will even be recommended, LLMs use a variety of sources to formulate a perception of your products and your brand. According to an AI visibility study from SEMrush and other research studies, they have found several ranking factors including:

    • Website content
    • Brand mentions (including associated sentiment)
    • Customer reviews
    • Competitor comparison
    • Structured data
    • Content freshness

    This perception is persistent and synthesized. If it doesn’t align with the user’s intent, persona, or expectations, your brand is excluded before fanout ever begins.

    According to Jessica Bowman who wrote the article about LLM Perception Match, “Simply put, LLM perception match is the gatekeeper. Without it, content quality and SEO don’t matter – you won’t even be in the running.”

    Auditing Your Brand in LLMs

    So with this potential barrier to LLM and AI Search inclusion, what do you do?

    Take the time to audit your brand and determine perception based on the results. Is the AI engine saying your products are hard to use, your features are behind the times or you used to be the market leader, but are lagging in development?

    One thing about ChatGPT or other AI engines is that they won’t hold back or sugar-coat the reality about your marketplace perception. If there are weaknesses in your supply chain, product offering or customer experience, AI will surface that.

    It might make for difficult conversations with senior leadership and operations. This change is a signal that the days are when when you could write great content for a crummy company and still win the search game.

    What It Means for Brands

    It means that working on search and PR means being in lock step with senior management and directly integrated with operations. It also means that companies that have been able to avoid looking at their end-to-end performance will start to become invisible in the search world . . . at least until ChatGPT monetizes their tool and allows people to buy their way back in.

    Podcast Transcripts are not a Luxury but a Necessity

    There is not doubt that podcasts are on the rise. According to Backlinko:
    🔹Over 584.1 million podcast listeners worldwide
    🔹55% of the US population 12+ listen to a podcast at least once a month
    🔹Podcast ad spending is estimated to hit $4.46 billion in 2025

    So I wondered if AI engines were indexing the audio. I asked that question to ChatGPT, Gemini, CoPilot, Claude and Perplexity. Of those AI engines, ONLY Google was able to crawl audio files.

    Google was the standout because it has has its own speech-to-text (STT) technology called Google Cloud Speech-to-Text. It is an AI service that was designed to convert audio into text transcriptions. It is also something you can see at work in YouTube.

    All of the other AI Engines were NOT able to crawl audio files. Specifically, ChatGPT said:

    🗨I don’t pull in raw podcast audio. When I “browse,” I’m limited to text that’s already on the web—so if a podcast episode has a published transcript or detailed show notes, I can read those, but I don’t stream or transcribe the audio myself. My underlying training may include some publicly available podcast transcripts, but not the audio files.🗨

    What does that mean? If you are not:
    🔹Creating web pages for your podcast
    🔹Providing full transcript text
    🔹Making your brand / web the owner of that transcript content

    Your information is not going to be include in search AI results with the potential exception of Gemini.

    Data is the Love Language of Search

    It doesn’t matter if you are talking about ‘traditional’ search or ‘ai’ search, providing technology crawlers with your content as data has so many advantages.

    It is the difference between having a gallon of crude oil that can be processed to put in your gas tank – OR – you can just put gas in the car. Schema is like providing the Google and ChatGPT gasoline instead of crude oil.

    How Content is Interpreted

    Although these technologies have come a long way to understanding content on a website, they still can’t see the content, but it really gets interpreted from the code used. When a human ‘sees’ online content, they can use structure and color and memory to determine a message.

    Crawlers and AI engines are smart and can figure most things out with machine learning and IA. But why leave that to chance?

    Why Generate Schema

    By generating schema for your focus content online, you give search engines and LLMs extra help interpreting your content. It helps create a deeper understanding of type, meaning, and intent. Providing content as data also allows crawlers be more efficient because they are built to understand schema markup. It isn’t left for interpretation.

    As I like to say, I don’t like to let Google interpret anything for me.

    Adding Schema Markup

    You can do this by adding semantic Schema Markup to your web pages. Schema Markup was developed by the big search engines like Google, Yahoo, and Bing. It is a dictionary of structured data. Since it follows a standardized set of data, it can more rapidly and confidently understand the content. That allowing the search engines to have a higher degree of confidence they are presenting content that better aligns with a users search intent.