
Most businesses still have no idea whether they show up when someone asks ChatGPT about their industry, or whether Perplexity cites them for the questions their customers are actually asking. They optimise for Google rankings while AI systems quietly route every informational query to competitors who passed a completely different set of filters. The problem is not awareness — most marketing teams know AEO is happening. The problem is they have no structured way to diagnose their current position.
An AEO audit gives you that diagnosis. It is not a single tool run or a one-click report. It is a six-stage methodology that moves from observable symptoms — are you getting cited at all? — to root causes: missing answer structure, absent structured data, weak entity signals, blocked crawlers. Each stage produces a concrete checklist item you can act on. I have run this process across dozens of sites in our portfolio and it consistently surfaces the same failure patterns, which I will walk through in detail here.
One honest caveat before we start: passing every item on this audit checklist is necessary but not sufficient. If a well-authorised competitor has been writing cleaner answer-first content on the same topics for three years and has a stronger citation record already, their authority advantage will not disappear because you tightened your robots.txt or added FAQ schema. Structure is the floor. Authority — earned through consistent, expert content that AI models have ingested and cited repeatedly — is the ceiling. This audit tells you whether your floor is solid. Building the ceiling takes longer and is covered in the companion guide at /resources/aeo-methodology/.
What Does an AEO Citation Baseline Test Actually Measure?
A citation baseline test measures whether AI systems name, link, or paraphrase your site when answering a fixed set of queries relevant to your business. You fire the same prompt set at ChatGPT, Perplexity, Claude, and Google AI Overviews, log every response, and score each model separately — because citation patterns differ significantly across platforms.
The mechanics are straightforward but easy to do sloppily. Start by defining a prompt set of 20 to 40 queries — not just branded queries, but the informational and comparison questions your target customers ask before they make a decision. For a law firm, that might be questions like 'what should I do after a car accident in Florida' or 'how long does a personal injury case take.' For a SaaS company it might be 'what is the best project management tool for remote teams.' Write prompts as natural language questions, not keyword strings, because that is how AI systems receive them.
Run each prompt in a fresh session — do not carry conversation history across queries, or you will contaminate results with context the model accumulated from prior turns. Log the full response text, note any sources cited or linked, and flag whether your brand or domain appears anywhere in the output, even without a direct link. Do this for at least ChatGPT (web-browsing enabled), Perplexity (both default and focus modes), Claude (with web access if available in your region), and Google AI Overviews via a standard Google search. Record results in a spreadsheet with one row per prompt per model.
At the end of this pass you will have a citation rate: the percentage of queries where your site appears across each platform. In our portfolio, new clients typically start at zero to ten percent citation rate on non-branded informational queries. Sites that have been intentionally optimised for AEO for six months or more generally hit thirty to fifty percent or above on their core topic cluster. The gap between your baseline and that range tells you how much structural and content work lies ahead. Cross-reference this with /blog/how-to-measure-aeo/ for the ongoing tracking methodology you will use after the audit.
How Do You Audit Content Structure for Answer Engine Readiness?
A content structure audit checks whether each page delivers a standalone, direct answer to a clearly stated question — before adding context, caveats, and depth. AI systems extract answer-card responses from content that leads with the answer. Pages that bury the answer in paragraph three after two sentences of preamble are structurally invisible to the extraction layer.
Pull your top twenty traffic pages and your top twenty pages targeting AEO-priority queries. For each page, check three structural requirements. First, does the page title or the nearest H2 above each key section phrase the topic as a question? AI citation tends to favour content where the question and answer are co-located and semantically tight. A section titled 'Our Approach to SEO' is weaker than 'What Does an SEO Agency Do?' Second, does the first 40 to 60 words after each question H2 give a complete, standalone answer — one that makes sense without reading what follows? That 40-to-60-word range is not arbitrary; it aligns with the excerpt length AI systems typically surface in answer cards and cited snippets.
Third, does the page include a dedicated FAQ section with at least five to eight question-and-answer pairs, where each answer is self-contained? FAQ sections are high-value extraction targets for AI systems. They present structured Q&A pairs in a format that is cheap to parse, and they cover the long tail of follow-up questions a user might ask. Pages without FAQ sections miss a reliable citation opportunity. I see this mistake constantly — sites with good pillar content and zero FAQ sections, leaving easy wins on the table.
Score each page out of three. A page that passes all three tests scores a three. Anything below a two is a rewrite priority. Document your findings in a content structure scorecard with columns for page URL, question H2 count, lead-answer compliance, and FAQ present or absent. This scorecard becomes the input to your editorial sprint. The full framework for writing answer-first content is in /blog/aeo-ranking-factors/, where I break down the evidence for why structure correlates with citation frequency.
What Structured Data Schemas Matter Most for AEO?
The four schemas that directly support AEO visibility are FAQPage, Organization, Person (for author markup), and BreadcrumbList. FAQPage tells AI systems which content on your page is intended as a Q&A answer. Organization and Person establish entity identity for your brand and authors. BreadcrumbList signals content hierarchy. Missing any of these is a gap with a known fix.
Start your structured data audit with a crawl of every page using a tool that renders JavaScript — raw HTML crawlers miss dynamically injected schema. Export all JSON-LD blocks and validate them against the official schema.org FAQPage specification at https://schema.org/FAQPage. The FAQPage type sits in the hierarchy Thing > CreativeWork > WebPage > FAQPage and expects mainEntity properties of type Question, each with an acceptedAnswer. A common error I see is sites that mark up their FAQ section with FAQ schema but use incorrect property names — 'answer' instead of 'acceptedAnswer', or 'text' omitted from the Answer node. These errors cause Google's Rich Results Test to fail silently, meaning you think the schema is live when it is not doing anything.
Organization schema should be present on your homepage and service pages at minimum. It should include name, url, logo, address (for local businesses), sameAs links pointing to your verified social profiles and any Knowledge Panel entities, and a description of 150 words or more. The sameAs array is especially important for entity disambiguation — it tells AI systems and knowledge graphs that your website, LinkedIn page, Google Business Profile, and Crunchbase entry all refer to the same real-world organisation.
For Person schema on author profiles and bylined blog posts, include name, jobTitle, worksFor, url, and sameAs linking to the author's LinkedIn profile or verified third-party bio. AI systems increasingly use author identity as a proxy for expertise signals — E-E-A-T is not just a Google concept, it is a pattern that language models have learned from the training corpus. An author page with rich Person schema and verifiable credentials is meaningfully different from a generic 'written by the team' attribution. Run all schema blocks through Google's Rich Results Test after deployment and capture screenshots of passing and failing tests as audit evidence.
How Do You Check Entity Presence and Consistency for AI Visibility?
Entity presence audits verify that your brand is represented consistently across all the data sources AI systems use to resolve organisational identity: your website, Google Business Profile, Wikidata, major directories, and social profiles. Inconsistencies in name, address, phone number, or category create disambiguation failures — the AI system is not confident enough in the entity match to cite you.
The most practical starting point is a NAP audit — Name, Address, Phone. Pull your listings from the top 15 directories: Google Business Profile, Yelp, Bing Places, Apple Maps, Facebook, LinkedIn, BBB, Manta, Foursquare, and any industry-specific directories relevant to your vertical. Check that the business name, address, and phone number are identical across all of them, character for character. Do not use 'St.' on one listing and 'Street' on another. Do not use a tracking phone number on your website that differs from the number on Google Business Profile. These look like minor inconsistencies to a human; to a knowledge graph reconciliation process, they are evidence that two different entities might be involved.
Beyond NAP, check whether your brand has a Wikidata entry. Wikidata is one of the primary structured knowledge bases that large language models incorporate during training, and a verified Wikidata entry with sameAs links to your website, LinkedIn, and social profiles significantly strengthens entity recognition. If you do not have a Wikidata entry, you should evaluate whether your organisation meets the notability threshold to create one — typically a business with media coverage in verifiable publications qualifies.
Check your sameAs array in your Organization schema and cross-reference it against the actual profiles it points to. Dead links or mismatched brand names in sameAs arrays are common technical debt items in sites that have been through multiple rebrands. Also audit your Google Business Profile category assignments — the primary and secondary categories should match the topical focus of your website content. An IT services firm with 'Computer Repair' as its primary GBP category and blog content entirely about enterprise cloud architecture is sending conflicting entity signals to every AI system that reads both data sources.
How Do You Audit Technical Crawlability for AI Bots?
A technical crawlability audit for AI bots checks your robots.txt file for rules that block known AI crawler user agents, verifies that server response times are under two seconds for key pages, and confirms that your HTML is clean enough for AI systems to extract structured content without JavaScript execution. Blocked or slow pages simply do not make it into AI systems' retrieval pools.
Fetch your robots.txt file directly at yourdomain.com/robots.txt and review every Disallow rule. The AI crawler user agents you need to explicitly verify are not blocked include: GPTBot (OpenAI's training and retrieval crawler, user agent string Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot), PerplexityBot (Perplexity's indexing crawler, documented at https://docs.perplexity.ai/docs/resources/perplexity-crawlers), ClaudeBot (Anthropic's crawler), and Google-Extended (Google's token for Gemini training and grounding, documented at https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers). If your robots.txt has a broad 'Disallow: /' rule under User-agent: * and does not re-allow these specific agents, you are blocking all of them.
A common mistake is sites that implemented aggressive bot-blocking rules after a DDoS incident or a scraping complaint, then never reviewed those rules for AI crawlers. The rule that blocks a competitor's content thief also blocks GPTBot. I recommend reviewing robots.txt with a specific AI-crawler lens at least once per quarter — new AI systems launch and add crawlers regularly. Also check for Cloudflare or AWS WAF rules that block by user agent pattern. Perplexity's official documentation notes that their PerplexityBot and Perplexity-User agents may be blocked by WAF rules even when robots.txt is permissive, and recommends explicitly allowlisting by both user agent string and published IP range.
Beyond robots.txt, run a server response test on your top 20 pages using a tool like GTmetrix or WebPageTest from a US East server. Time to First Byte above 800ms is a risk factor — retrieval-augmented generation systems that fetch pages at query time will time out or deprioritise slow responders. Check that critical content pages render in clean HTML without requiring JavaScript execution to display body text. Use curl or a developer tools 'view source' check to confirm that your main content is present in the raw HTML response. Pages that require client-side rendering to display content are structurally opaque to crawlers that do not execute JavaScript.
How Do You Identify Competitive Citation Gaps in AI Search?
A competitive citation gap analysis runs your same prompt set against AI systems but specifically notes which competitors are cited in your place. Every response where a competitor appears and you do not is a citation gap. Analysing which pages and content types earn those competitor citations reveals the specific content, structure, or authority investments needed to close the gap.
Take your 20-to-40 prompt set from the baseline test and re-run it, this time logging every source cited in the response — not just yours. Compile the cited domain list. You will likely find that two to four competitors dominate your citation landscape. For each of those competitors, pull the specific pages that are being cited. What is the format of those pages — long-form guide, FAQ page, comparison article, glossary entry? What is the approximate word count? Do they use question H2s? Do they lead with a direct answer? This forensic analysis tells you which content formats AI systems have learned to trust on your topic.
Next, do a structured data comparison. Check whether cited competitor pages have FAQPage schema, Organisation or Person author markup, and BreadcrumbList. If your pages match them on content format but lack structured data, that is a high-confidence fix. If they have materially more detailed, more authoritative, or more frequently updated content than yours, you are looking at a longer content investment rather than a technical quick fix.
Finally, check the citation frequency over time by running the same prompts monthly and tracking changes. Citation share of voice fluctuates as AI models update and as new content is indexed. A competitor who suddenly starts appearing in citations that you had owned in prior months has likely published new content, earned fresh backlinks, or improved their structured data. This ongoing tracking is what separates an audit — a point-in-time snapshot — from an AEO programme. The measurement cadence is covered in detail at /blog/how-to-measure-aeo/. For the full framework including content strategy decisions, see /blog/aeo-ranking-factors/.
Where Does an AEO Audit Fall Short — and What Does It Not Fix?
An AEO audit diagnoses structural and technical deficits. It cannot manufacture authority. If your site lacks third-party citations, expert authorship, and a track record of being cited by other authoritative sources, fixing your robots.txt and adding FAQ schema will improve your floor but will not close the gap against a well-established competitor. Authority accrues over time and cannot be audited into existence.
I want to be direct about this because I see too many agencies present AEO audits as if completing the checklist is the end of the work. It is the beginning. The audit identifies the table-stakes requirements — your site must be crawlable, your content must be structured, your entities must be consistent. Meeting those requirements is the minimum to be eligible for AI citations. But eligibility is not selection. AI systems make probabilistic choices about which sources to cite based on patterns in their training data, real-time retrieval quality signals, and the accumulated citation history of a domain.
A site that has been consistently cited by other authoritative domains for three years, that has published expert-authored content with verifiable credentials, and that has a track record of accurate, well-structured answers has earned a citation-authority advantage that a newly audited site will not overcome in 30 days. Our own portfolio data consistently shows that newly optimised sites take four to six months of regular content production and citation-building activity before their AI citation rates meaningfully lift above baseline. The audit accelerates that timeline by removing preventable structural barriers — but it does not replace the time investment.
The implication for prioritisation is practical: complete the technical items from this audit first, because they have the highest ROI relative to effort — a blocked robots.txt rule costs one afternoon to fix and immediately makes your content eligible for indexing by AI crawlers. Content rewrites and new FAQ sections are next, because they improve the extraction surface on content you already have. Entity and structured data work follows. Competitive citation gap content — new articles designed to displace competitor citations — comes last, because those require the most production time and have the longest feedback loop. Use the /services/aeo/ page as your reference for what a managed AEO programme looks like when the audit findings are translated into a sustained execution roadmap.
What Are the 10 Priority Checklist Items from an AEO Audit?
These ten items represent the highest-impact, most commonly failed checks across the AEO audits I run. Work through them in order — the first five are technical or structural fixes that clear the path for AI crawlers and extraction systems. The last five build the authority and content depth that determine citation selection.
- Verify robots.txt explicitly allows GPTBot, PerplexityBot, ClaudeBot, and Google-Extended — do not rely on a blanket wildcard allow if other Disallow rules are present.
- Confirm WAF rules (Cloudflare, AWS WAF) are not silently blocking AI crawler IP ranges even when robots.txt is permissive — check Perplexity's published IP allowlist documentation.
- Audit all top-20 content pages for question H2s — each major section should be phrased as a question the target reader would ask.
- Check that each question H2 is followed by a 40-to-60-word lead answer that is self-contained and accurate without reading further.
- Validate FAQPage JSON-LD schema on all blog posts and service pages — run each through Google's Rich Results Test and fix any property-name errors.
- Audit Organization schema for completeness: name, url, logo, address, description, and a sameAs array pointing to all verified profiles.
- Add Person schema to all author profile pages and bylined posts — include name, jobTitle, worksFor, and a sameAs link to the author's LinkedIn or verified third-party bio.
- Run a NAP consistency check across the top 15 business directories and reconcile any name, address, or phone mismatches.
- Test server Time to First Byte on top-20 pages — flag anything above 800ms for performance review.
- Run your full prompt set on ChatGPT, Perplexity, Claude, and Google AI Overviews and build a citation share-of-voice baseline that you re-run monthly to track progress.
Sources and further reading
These are the primary sources referenced in this article. Each is an authoritative documentation page or publication we verified before citing.
- schema.org FAQPage specification — Authoritative reference for the FAQPage structured data type, including the correct property hierarchy (Thing > CreativeWork > WebPage > FAQPage) and required mainEntity/acceptedAnswer properties used in structured data validation.
- OpenAI's official GPTBot documentation — Official OpenAI documentation confirming GPTBot's user agent string, its purpose for training generative AI foundation models, and how site owners can allow or disallow it in robots.txt.
- Perplexity crawler documentation — Official Perplexity documentation covering the PerplexityBot and Perplexity-User agents, their user agent strings, robots.txt behavior, and WAF allowlisting instructions for Cloudflare and AWS.
- Google's common crawlers documentation — Official Google Search Central documentation explaining the Google-Extended robots.txt token, which controls whether a site's content can be used for Gemini model training and grounding in AI Overviews.


