A tiny text file with enormous power
There's a file sitting at the root of your website — yoursite.com/robots.txt — that tells every search engine crawler what it can and can't access. It's been around since 1994, it's just plain text, and a single wrong line can make your entire site disappear from Google.
How robots.txt works
When Googlebot arrives at your site, the very first thing it does is check /robots.txt. The file contains rules like:
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml
This says: "All crawlers can access everything except /admin/ and /private/. And here's where the sitemap is."
Simple, right? The problem is that small mistakes have big consequences.
The mistakes that keep happening
Accidentally blocking the whole site. This is more common than you'd think:
User-agent: *
Disallow: /
That single slash after Disallow blocks everything. Every page. Your entire site goes dark in Google. It happens during development (to keep staging sites out of Google) and someone forgets to remove it before launch.
Blocking CSS and JavaScript. Old advice used to say "block your CSS and JS files." That's terrible advice now. Google needs to render your pages to understand them. Blocking these resources means Google sees a broken page.
Blocking important sections by accident. A Disallow: /blog meant to block /blog-drafts/ will also block /blog/ — your entire blog.
No robots.txt at all. Without one, crawlers access everything (which might be fine) but you lose control over crawl budget optimization and can't point crawlers to your sitemap.
What robots.txt can and can't do
| Can do | Can't do |
|---|---|
| Prevent crawling of a URL | Prevent indexing (use noindex for that) |
| Control crawl budget allocation | Remove pages already in the index |
| Block specific crawlers | Guarantee protection of sensitive data |
| Point to your sitemap | Override a noindex directive |
This is a critical distinction: robots.txt blocks crawling, not indexing. If other sites link to a page you've blocked in robots.txt, Google might still index the URL — it just won't know what's on it.
Checking your robots.txt
Every site should periodically verify that:
- The file exists and is accessible at
/robots.txt - Important pages aren't accidentally blocked
- CSS and JavaScript files aren't blocked
- The sitemap URL is included and correct
- No overly broad
Disallowrules are catching pages they shouldn't
Kaitico checks your robots.txt during every audit — verifying it's accessible, parsing the rules, and flagging any directives that might be blocking important content from search engines.