Robots.txt Audit Checklist

A misconfigured robots.txt can have significant consequences for your site’s visibility and security. To make sure your website is presenting its best self to search engines, a regular robots.txt audit is essential. This checklist will guide you through the key areas to inspect, helping you identify potential issues impacting your website organic performance.

File Presence
& Location

Is the robots.txt file placed at the root of the domain (e.g., https://www.example.com/robots.txt)?
Does the robots.txt file return a valid 200 HTTP status code (not 404 or 403)?
Is the file accessible to all major search engine bots (e.g., Googlebot, Bingbot)?
If the site is accessible via both HTTP and HTTPS, is a robots.txt file available for each protocol? (Note: websites should always be served the secured HTTPS)
If the site is served from non-standard ports, is a separate robots.txt file provided per port? (Note: Standard ports are Port 80 for standard HTTP and Port 443 for standard HTTPS)

Blocking
Directives

Are any important pages or directories (e.g., /blog/, /products/) unintentionally disallowed?
Are pagination URLs (e.g., ?page=) or filter parameters blocked without an alternative for discovery? (For more, see our Pagination Audit Checklist).
Are internal search results pages (e.g., /search?q=) blocked to avoid index bloat?
Are sensitive pages (e.g., /admin/, /checkout/, user dashboards) properly disallowed?
Are low-value or duplicate URLs with tracking/session parameters (e.g., ?ref=, ?sessionid=) disallowed?
Are file types like .gif, .pdf, or .doc blocked where appropriate using patterns such as Disallow: /*.pdf$?
Is Disallow: / used with extreme caution to avoid blocking the entire site unintentionally?
Are duplicate filters or conflicting parameters prevented from creating crawlable variants?

Syntax, Formatting
& Wildcards

Are all directives correctly spelt and formatted (e.g., User-agent:, Disallow:, Allow:)?
Are user-agent groups clearly defined, with associated rules grouped together?
Are wildcards (*) and end-of-line anchors ($) used only where supported and required?
Are improper uses of wildcards (e.g., Disallow: /product*) avoided to prevent overblocking?
Are comments introduced with # and separated from actual directives?
Are inconsistent casing or spacing errors (e.g., user agent instead of User-agent) avoided?
Is there a standardised order for URL parameters to avoid unintentional duplication?

Testing & Validation

Has the file been tested using the Google robots.txt Tester?
Do test results confirm that key URLs are blocked or allowed as intended?
Has the file been validated after updates to confirm syntax and logic correctness?
Is real-time testing of critical faceted, paginated, and media URLs performed to ensure crawlability?
Are blocked resource requests checked for negative impact on how pages are rendered by Google?

Crawl Management
& Efficiency

Are faceted navigation or filtered result URLs (e.g., ?colour=, ?size=) disallowed if they don’t offer a unique value?
Are infinite or duplicate crawl paths (e.g., calendars, sort options, redundant paths) effectively blocked?
Are staging, testing, or development environments blocked via robots.txt or HTTP authentication?
Is crawl budget protected by preventing the indexing of low-priority or duplicate pages?
Are noindex directives used in combination with crawl allowance where appropriate (e.g., for thin but discoverable pages)?

Resource Accessibility & Rendering

Are JavaScript, CSS, font, and image directories unblocked if required for page rendering and indexing?
Has blocked resource testing confirmed there is no negative effect on Google’s ability to render the site?
Are all critical frontend assets visible and functional when the page is rendered as Googlebot?
Are robots.txt rules designed to support mobile-first rendering and indexing?

Sitemap &
Bot Guidance

Is the sitemap location specified at the end of the file using the Sitemap: directive?
Is the sitemap accessible and up to date?
Are specific user-agents (e.g., Googlebot-Image, AdsBot-Google) provided with tailored instructions where needed?
Are unnecessary bot-specific rules avoided to reduce complexity unless necessary?
Are crawler directives used intentionally to guide bots to high-value content and away from low-value areas?

Monitoring & Maintenance

Is the robots.txt file reviewed regularly as the site evolves (e.g., new sections, URL structures)?
Are crawl stats in Google Search Console monitored for signs of blocked or under-crawled areas?
Are server logs reviewed periodically to identify unexpected crawler access patterns?
Is there version control or change tracking for the robots.txt file?
Are site migrations or URL structure changes accompanied by robots.txt updates?

Robots.txt Audit Checklist

File Presence
& Location

Blocking
Directives

Syntax, Formatting
& Wildcards

Testing & Validation

Crawl Management
& Efficiency

Resource Accessibility & Rendering

Sitemap &
Bot Guidance

Monitoring & Maintenance

Company

Services

Legal

Robots.txt Audit Checklist

File Presence & Location

Blocking Directives

Syntax, Formatting & Wildcards

Testing & Validation

Crawl Management & Efficiency

Resource Accessibility & Rendering

Sitemap & Bot Guidance

Monitoring & Maintenance

Company

Services

Legal

File Presence
& Location

Blocking
Directives

Syntax, Formatting
& Wildcards

Crawl Management
& Efficiency

Sitemap &
Bot Guidance