How to define KPIs for LLMs and track what actually matters
Search has changed, and the reporting hasn't caught up. CMOs and marketing leaders are being asked the same questions in every boardroom: why aren't we showing up in AI, what should we be tracking, and how do we measure visibility in a channel that no longer hands us clicks?
This session covers what LLMs actually rely on, the metrics that matter, the ones to ignore, and a real example of moving from position six to position one in LLM visibility in eight weeks.
Yaser Ayub
Founder of Rayze, helping businesses scale through SEO and GEO for better LLM visibility.
Published:
April 22, 2026
Share:
Guest Speaker
Fred Laurent is the co-founder of Inlinks and Waikay, which he built alongside Dixon Jones. He is a software engineer specialising in SEO and semantics, and has been working in the field since 2010. He lives in Nice in the south of France, and is also a saxophone player.
Main host
Yaser is a marketer with 18 years of experience across SEO and paid search, including senior roles at Rocket Internet and in fintech. His focus now is SEO, GEO and AEO — the discipline of staying visible as search moves into AI. Rayze have been operating for six years and are rebranding as Rayze Digital this week. Yaser lived in Berlin, has two daughters, and is a big Arsenal fan.
Has search really changed?
Ten years ago, everyone wanted to rank number one. Five years ago, the focus shifted to featured snippets and voice search. The goal of ranking first hasn't gone away, but behaviour has fundamentally shifted with the rollout of LLMs like ChatGPT, Gemini, Perplexity and Claude. People now use these tools to find information and answer questions, which means the journey itself has changed.
Everyone is in a race to understand how to rank in LLMs — how to get to number one, how to increase share of voice, how to improve brand visibility. The reporting, though, hasn't kept up. That question — what do we actually track? — is what prompted this webinar.
Anyone working in search will recognise the "crocodile effect" graph: impressions climbing while clicks flatline or fall. The explanation lies in the rollout of AI Overviews and similar features, which have created zero-click searches. Your brand might be mentioned or cited without anyone clicking through. That is exactly why reporting on AI visibility has become so important.
Why there is no Google Search Console for AI
Google Search Console works because it is built around keywords. Tracking how you rank, how many clicks you get, and for which queries, is a logical way to measure traditional search. Prompts behave differently. They can run into thousands of words, which makes a clean overview of the prompts users are typing almost impossible.
Bing have released an AI Overviews report showing the citations a site has received in Gemini. It is one small step, but a useful one.
Three questions you've probably been asked
Three questions keep coming up, and most teams struggle to answer them:
Why is our brand not cited?
Why are our clicks down?
How do we show up for X?
Some of the variables behind these questions sit outside any team's control. But the shift is also a real opportunity for brands willing to get ahead of it.
What LLMs actually rely on
The signals have evolved, but the older ones — trust, authority and relevance — still matter. That is because LLMs are made of two parts. The first is the training data, which behaves like a read-only memory for each model. The second is web search, which the model uses to ground its answers.
Being included in LLM answers is difficult to engineer because so many factors are at play. Google was much simpler by comparison. ChatGPT, for example, has real cost considerations. Enabling web search adds cost per query, so there is effectively a decision being made at the model level about whether a given prompt needs web search, whether the user is paying, and whether the best possible answer is the priority. The result is that a brand can appear in ChatGPT 4.1 and be entirely absent from ChatGPT 5.2.
What is around 90% certain is that visibility in LLMs is mostly tied to brand authority. The bigger you are, and the more recognition you have across Google, Bing and elsewhere, the more likely you are to be mentioned. It is not guaranteed, though, and the results can be surprising. LLMs are still in their infancy. Three years ago, ChatGPT 3.5 was not particularly good. Today, when you run visibility studies at scale, the models still hallucinate on specific topics. In niche industries, the "competitors" an LLM associates with your brand can be genuinely unexpected.
Good SEO fundamentals still underpin ranking in LLMs — the tools evolve, the discipline stays the same. The bigger a brand is, the more likely it is to be included in training data, and training data is starting to look like a new position zero. If you are already in the training set and your competitors are not, you have a real competitive advantage.
What is training data?
Training data is the body of information a model has been trained on. Each time ChatGPT releases a new model, a specific dataset is used to build it. If a website launches after the training step of a given model, that site will not be represented in it. The only route in at that point is through web search.
Different LLMs take different approaches here. ChatGPT already knows a lot of online businesses at the training data stage — even local businesses like the restaurant near you. Gemini, by contrast, has very little of this in its training data and relies on web search and Google Maps data to fill the gap.
The core metrics that matter
The measurement framework comes down to a handful of questions:
What does AI know about our brand across the major LLMs?
How often is the brand mentioned for relevant queries?
What is our share of voice compared to competitors?
Which prompts are worth tracking — particularly open-ended ones involving "compare", "best" or "recommend"?
Which pages on our site are driving citations? Gemini and Bing's overview report now surface this.
How does this sit alongside traditional metrics like traffic and rankings?
The goal is to make AI visibility reportable in the same way search performance always has been.
Analysing what AI actually knows about your brand
The starting point is analysing each LLM response to a simple question: what do you know about my brand? The question sounds straightforward. The answers often are not. An LLM will tell you when the company was founded, where it is headquartered, and which features the product offers — and you may suddenly notice that some of those features do not exist. That is a classic hallucination, and it is the first thing to fix.
If the representation of your brand in LLMs is wrong, the consequences are real. A user might subscribe expecting a feature that does not exist, and end up frustrated. A single incorrect article on the web can be enough to cause this.
A useful example is Majestic. At one point, ChatGPT was responding that "Majestic SEO has closed and shut down all its activities," which is simply not true. The model had confused the brand with other companies using "Majestic" in their name, some of which had closed. That association was then fed back into further hallucinations. Fixing this kind of issue often means tracing the origin — sometimes a single AI-generated review or post — and contacting the site owner directly. Most of the time, the work is manual. Tools like Inlinks make it possible to see which URLs are feeding the LLMs.
Because LLMs are statistical, the same question needs to be asked several times to surface every potential hallucination. This fact-check against real brand, product and service information is the first thing worth doing.
Topic mapping and entity gaps
The next step is looking at which topics are being associated with your brand in LLM responses. Based on a prompt like "recommend the best SEO tools," Waikay analyses the topics each brand is being associated with.
On its own, that isn't actionable. What makes it useful is comparing those responses against a knowledge graph of the topics covered across your own site — the blog posts, service pages, and so on. This entity-mapping approach surfaces two things: the entities that are correctly mentioned, and the ones that are missing.
In practice, this is very powerful. If you run a financial services business with a strong focus on banking, and you find that "electronic invoicing" is not being mentioned — or not being mentioned enough — by the LLMs, you have identified a weakness. That weakness can be addressed with standard SEO actions: strengthening internal linking to your invoicing pages, producing more content, or lifting content that has been buried in a hard-to-find part of the site.
This is a real shift from how SEO used to work. Google was largely a black box. It was hard to know why you ranked for some keywords and not others. With LLMs, the model effectively talks back to you. Analysed with the right tools, those responses make it possible to see what is working, what isn't, and what to do about it.
Influencing visibility — it's not just about citations
The dominant narrative right now is that the way to improve LLM visibility is to collect more citations. That is only part of the story. LLMs rely on training data, web search and a content-first view of the world. New content is only one lever. Content updates, content improvements and structural changes to the site often matter just as much.
Internal linking is one example. User experience is another. Google now factors UX signals into how it ranks pages, which in turn affects how a site is cited in LLMs. If your UX is weak, your rankings suffer mechanically, and your LLM visibility follows.
A free tool worth looking at is Microsoft Clarity, which offers heat-mapping and a useful AI prompt feature. It pairs well with Google Analytics data.
A real example: from position six to position one
Inlinks specialises in entity-based SEO — they effectively invented the category. When the team first set up prompt tracking in Waikay, the first prompt they tracked was "recommend some entity-based SEO tools." Inlinks came in at number six. The tools sitting above it — SEMrush, Ahrefs and Moz — are excellent SEO platforms, but not entity-based SEO tools.
Industry ranking here means the number of times a brand appears in LLM responses, divided by the total number of brands mentioned.
Several things were done to close the gap:
A post was published explaining why SEMrush, Ahrefs and Moz — great SEO tools in their own right — are not entity-based SEO tools, because they do not use named entities in their analysis.
Using the named-entity analysis of LLM responses, an action plan was built in Waikay based on the entity gaps between those responses and the Inlinks knowledge graph.
One of the main gaps identified was content around entity-based SEO for local businesses — specifically, how local businesses could benefit from the approach. A new guide was published to supplement the existing one.
Internal linking was strengthened.
Within eight weeks, Inlinks moved from position six to position one, becoming the most frequently cited tool for "entity-based SEO."
The wider competitive landscape says something about the state of the models, though. Several of the other tools being cited — Clearscope, MarketMuse, Surfer SEO — do not really specialise in entity-based SEO. The tools that genuinely do are Inlinks, WordLift and Kalicube. The others are either being associated with the topic incorrectly or are simply well-known enough to appear. It shows how LLMs still hallucinate in niche categories, and how that hallucination has real commercial consequences. If a potential customer asks ChatGPT for an entity-based SEO tool and the model recommends SEMrush, and that customer already has a SEMrush subscription, Inlinks have lost them.
What to track and what to ignore
Worth tracking:
Citation frequency
Share of voice against competitors
Prompt visibility
Performance in AI answers
Worth ignoring — or at least treating with scepticism:
AI traffic estimates, which are currently unreliable
Prompt volume metrics, which internal studies suggest are largely fabricated and used as marketing metrics by some tools; what matters is how many clicks you are actually getting from LLMs
Social mentions without context — the mentions themselves can be useful, particularly because different LLMs pull from different social networks, but the data is only valuable when it is contextualised
The goal is to keep the measurement framework simple. Define a clear set of KPIs, build a dashboard around them, and accept that the landscape will keep shifting quickly.
Q&A
Q: How different are the rules used by Google AI Overviews versus the other LLMs?
AI Overviews mostly surface informational content, but they sit on the same underlying platform as Google's AI mode and Gemini, and the results are more stable than those produced by ChatGPT. The exact ruleset isn't public. One clear difference is that Google's training data is much thinner than ChatGPT's — Google, as always, does a lot of its heavy lifting at search time.
Q: Are you looking at the difference between citations (content used as a source) and mentions (the brand referenced without necessarily being linked)?
Yes. When your content is used as a source, the model considers it trustworthy. The two models also lean on different types of sources. Google tends to rely on less specialised sources — more established sites like G2 or Wikipedia. ChatGPT draws from a wider, more specialised pool.
People don't typically click through to sources, which means mentions are the metric that matters most commercially. Citations tell you where you should be mentioned and where content may need to be reviewed. Mentions tell you whether your brand is showing up when a user is actually looking for a product or service.
Q: Is entity-focused content — writing on chosen topics with lots of FAQs, chunkable content, and clear short sections — the best way for LLMs to understand a brand?
Broadly, yes. There is still a lot being discovered. ChatGPT, for example, appears to prioritise the beginning and end of a blog post over the inner content. The most important principle is clarity. Abbreviations and ambiguous sentences cause problems — either the model misunderstands, or it generates a hallucination. Google was much better at handling abbreviations because it was keyword-led. With LLMs, the question is whether the content is genuinely understandable.
Q: What is prompt visibility?
Prompt visibility is whether your brand appears for a specific prompt. The challenge is that you can't really replicate a real user prompt — a genuine deep-research conversation can run to thousands of words and is almost impossible to recreate. The practical approach is to rely on shorter, representative prompts. For a fish restaurant in New York, something like "recommend some fish restaurants in New York in this neighbourhood." For an SEO agency, "recommend some good SEO agencies for small businesses in Paris." These are the short prompts where your competitors will appear, and where you are expected to appear too. If you show up for the short prompts, you can generally expect to show up for the longer ones.
Q: How does Waikay's methodology differ from Profound's?
Profound's approach is essentially prompt tracking at scale, through scraping. Waikay takes a more granular approach — analysing individual LLM responses and producing insights from those, rather than relying on volume. That is how the content gap analysis works.
Q: Where can we keep up with developments across the key LLMs?
LinkedIn is the most practical source. There is so much being published that it is hard not to run into it. Newsletters are also useful, and there are several good ones worth subscribing to.
Final thoughts
We are roughly where SEO was in 2005. LLMs are still in their infancy, everything is moving quickly, and AI-generated content always needs to be reviewed by a human. Google's recent spam updates, which have hit aggressive AI content hard, are a reminder of that. The pace of change is also what makes this an exciting time to be working in search.