robots.txt: The Gatekeeper of Your Site

robots.txt controls which pages search engines can crawl. Learn how it works, common mistakes that block important pages, and how to audit yours.

A tiny text file with enormous power

There's a file sitting at the root of your website — yoursite.com/robots.txt — that tells every search engine crawler what it can and can't access. It's been around since 1994, it's just plain text, and a single wrong line can make your entire site disappear from Google.

How robots.txt works

When Googlebot arrives at your site, the very first thing it does is check /robots.txt. The file contains rules like:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

This says: "All crawlers can access everything except /admin/ and /private/. And here's where the sitemap is."

Simple, right? The problem is that small mistakes have big consequences.

The mistakes that keep happening

Accidentally blocking the whole site. This is more common than you'd think:

User-agent: *
Disallow: /

That single slash after Disallow blocks everything. Every page. Your entire site goes dark in Google. It happens during development (to keep staging sites out of Google) and someone forgets to remove it before launch.

Blocking CSS and JavaScript. Old advice used to say "block your CSS and JS files." That's terrible advice now. Google needs to render your pages to understand them. Blocking these resources means Google sees a broken page.

Blocking important sections by accident. A Disallow: /blog meant to block /blog-drafts/ will also block /blog/ — your entire blog.

No robots.txt at all. Without one, crawlers access everything (which might be fine) but you lose control over crawl budget optimization and can't point crawlers to your sitemap.

What robots.txt can and can't do

Can doCan't do
Prevent crawling of a URLPrevent indexing (use noindex for that)
Control crawl budget allocationRemove pages already in the index
Block specific crawlersGuarantee protection of sensitive data
Point to your sitemapOverride a noindex directive

This is a critical distinction: robots.txt blocks crawling, not indexing. If other sites link to a page you've blocked in robots.txt, Google might still index the URL — it just won't know what's on it.

Checking your robots.txt

Every site should periodically verify that:

  1. The file exists and is accessible at /robots.txt
  2. Important pages aren't accidentally blocked
  3. CSS and JavaScript files aren't blocked
  4. The sitemap URL is included and correct
  5. No overly broad Disallow rules are catching pages they shouldn't

Kaitico checks your robots.txt during every audit — verifying it's accessible, parsing the rules, and flagging any directives that might be blocking important content from search engines.

Want to check your site for this issue?

Kaitico scans your entire site and finds all SEO issues in minutes.

Start Free Audit