What is robots.txt?

robots.txt is a plain text file placed at a website’s root to tell crawler bots which areas may be crawled. It is typically available at a URL such as https://example.com/robots.txt.

It is not a security mechanism. Admin panels, personal data, and sensitive files should not be protected with robots.txt because the file is public and malicious bots can ignore its rules.

How Does it Work?

robots.txt contains rules for crawlers:

User-agent: *
Disallow: /admin/
Allow: /blog/
Sitemap: https://example.com/sitemap.xml

User-agent identifies the bot, Disallow lists paths that should not be crawled, and Allow can define exceptions. The Sitemap line can point search engines to the XML sitemap.

What to Watch For

robots.txt controls crawling, but it does not always prevent indexing. If a URL is linked from other sites, a search engine may show the URL in results without crawling its content. Page-level noindex or access control is needed when content must stay out of search.

In sitemap and technical SEO work, robots.txt helps keep crawl budget away from filter pages, carts, internal search results, and test areas. A mistaken Disallow: / line can stop the entire site from being crawled.