You built 500 pages. Google only knows about 300.
A common surprise when auditing a site: the sitemap lists fewer pages than what actually exists, or includes pages that shouldn't be there. Either way, you've got a coverage gap — and Google is making decisions with incomplete information.
What an XML sitemap does
An XML sitemap is a file that lists the URLs you want search engines to know about. It's not a ranking factor — it's a discovery aid. Think of it as handing Google a map of your site.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-one</loc>
<lastmod>2026-03-15</lastmod>
</url>
</urlset>
Googlebot still crawls by following links, but the sitemap helps it find pages that might not be well-linked internally — especially on large or new sites.
When sitemaps matter most
- Large sites — with thousands of pages, some will inevitably be poorly linked
- New sites — not many external or internal links yet
- Sites with frequent content changes — the
lastmodtag tells Google what to re-crawl - JavaScript-heavy sites — where Google might struggle to discover links
Common sitemap problems
| Problem | Impact |
|---|---|
| Pages missing from sitemap | Google might never discover them |
| Sitemap includes noindex pages | Wastes crawl budget on pages you don't want indexed |
| Sitemap includes 404 or redirected URLs | Signals a poorly maintained site |
| Sitemap not referenced in robots.txt | Google might not find the sitemap itself |
Stale lastmod dates | Google loses trust in your sitemap data |
| Sitemap exceeds 50,000 URLs or 50MB | Needs to be split into multiple sitemaps |
What should (and shouldn't) be in your sitemap
Include:
- All indexable pages (returning 200, not noindexed)
- Canonical versions of URLs only
- Pages you want Google to prioritize
Exclude:
- Pages blocked by robots.txt
- Noindexed pages
- Redirected URLs (3xx)
- Error pages (4xx, 5xx)
- Duplicate content (non-canonical versions)
- Login-required pages
How to audit sitemap coverage
A proper audit compares your sitemap against what's actually on the site:
- Pages on site but not in sitemap — missed coverage
- Pages in sitemap but returning errors — 404s, 500s, redirects
- Noindex pages in sitemap — contradictory signals
- Sitemap accessibility — is it reachable and properly formatted?
- robots.txt reference — does your robots.txt point to the sitemap?
Kaitico compares its crawl results against your sitemap during every audit, flagging coverage gaps, error URLs, and inconsistencies between what your sitemap claims and what your site actually serves.