Robots.txt Audit Checklist

A misconfigured robots.txt can have significant consequences for your site’s visibility and security. To make sure your website is presenting its best self to search engines, a regular robots.txt audit is essential. This checklist will guide you through the key areas to inspect, helping you identify potential issues impacting your website organic performance.

File Presence
& Location

  • Is the robots.txt file placed at the root of the domain (e.g., https://www.example.com/robots.txt)?
  • Does the robots.txt file return a valid 200 HTTP status code (not 404 or 403)?
  • Is the file accessible to all major search engine bots (e.g., Googlebot, Bingbot)?
  • If the site is accessible via both HTTP and HTTPS, is a robots.txt file available for each protocol? (Note: websites should always be served the secured HTTPS)
  • If the site is served from non-standard ports, is a separate robots.txt file provided per port? (Note: Standard ports are Port 80 for standard HTTP and Port 443 for standard HTTPS)

Blocking
Directives

  • Are any important pages or directories (e.g., /blog/, /products/) unintentionally disallowed?
  • Are pagination URLs (e.g., ?page=) or filter parameters blocked without an alternative for discovery? (For more, see our Pagination Audit Checklist).
  • Are internal search results pages (e.g., /search?q=) blocked to avoid index bloat?
  • Are sensitive pages (e.g., /admin/, /checkout/, user dashboards) properly disallowed?
  • Are low-value or duplicate URLs with tracking/session parameters (e.g., ?ref=, ?sessionid=) disallowed?
  • Are file types like .gif, .pdf, or .doc blocked where appropriate using patterns such as Disallow: /*.pdf$?
  • Is Disallow: / used with extreme caution to avoid blocking the entire site unintentionally?
  • Are duplicate filters or conflicting parameters prevented from creating crawlable variants?

Syntax, Formatting
& Wildcards

  • Are all directives correctly spelt and formatted (e.g., User-agent:, Disallow:, Allow:)?
  • Are user-agent groups clearly defined, with associated rules grouped together?
  • Are wildcards (*) and end-of-line anchors ($) used only where supported and required?
  • Are improper uses of wildcards (e.g., Disallow: /product*) avoided to prevent overblocking?
  • Are comments introduced with # and separated from actual directives?
  • Are inconsistent casing or spacing errors (e.g., user agent instead of User-agent) avoided?
  • Is there a standardised order for URL parameters to avoid unintentional duplication?

Testing & Validation

  • Has the file been tested using the Google robots.txt Tester?
  • Do test results confirm that key URLs are blocked or allowed as intended?
  • Has the file been validated after updates to confirm syntax and logic correctness?
  • Is real-time testing of critical faceted, paginated, and media URLs performed to ensure crawlability?
  • Are blocked resource requests checked for negative impact on how pages are rendered by Google?

Crawl Management
& Efficiency

  • Are faceted navigation or filtered result URLs (e.g., ?colour=, ?size=) disallowed if they don’t offer a unique value?
  • Are infinite or duplicate crawl paths (e.g., calendars, sort options, redundant paths) effectively blocked?
  • Are staging, testing, or development environments blocked via robots.txt or HTTP authentication?
  • Is crawl budget protected by preventing the indexing of low-priority or duplicate pages?
  • Are noindex directives used in combination with crawl allowance where appropriate (e.g., for thin but discoverable pages)?

Resource Accessibility & Rendering

  • Are JavaScript, CSS, font, and image directories unblocked if required for page rendering and indexing?
  • Has blocked resource testing confirmed there is no negative effect on Google’s ability to render the site?
  • Are all critical frontend assets visible and functional when the page is rendered as Googlebot?
  • Are robots.txt rules designed to support mobile-first rendering and indexing?

Sitemap &
Bot Guidance

  • Is the sitemap location specified at the end of the file using the Sitemap: directive?
  • Is the sitemap accessible and up to date?
  • Are specific user-agents (e.g., Googlebot-Image, AdsBot-Google) provided with tailored instructions where needed?
  • Are unnecessary bot-specific rules avoided to reduce complexity unless necessary?
  • Are crawler directives used intentionally to guide bots to high-value content and away from low-value areas?

Monitoring & Maintenance

  • Is the robots.txt file reviewed regularly as the site evolves (e.g., new sections, URL structures)?
  • Are crawl stats in Google Search Console monitored for signs of blocked or under-crawled areas?
  • Are server logs reviewed periodically to identify unexpected crawler access patterns?
  • Is there version control or change tracking for the robots.txt file?
  • Are site migrations or URL structure changes accompanied by robots.txt updates?
Tech SEO Pros
Logo