Overview
Before launching a website, it’s important to review both your canonical URLs and your robots.txt file. These two elements help search engines understand which pages to index, prevent duplicate content issues, and avoid accidentally blocking important pages from appearing in search results.
This guide will walk you through how to check canonical tags and robots.txt to ensure your site is SEO-ready.
Step 1: What Are Canonical URLs and Why They Matter
What Is a Canonical URL?
A canonical URL is a tag (<link rel=”canonical”>) added to the HTML <head> of a page to indicate the preferred version of that page when duplicates exist. It tells search engines which URL should be treated as the authoritative source.
Why It’s Important:
• Prevents duplicate content issues across similar pages
• Consolidates link equity to the correct version of the page
• Helps search engines understand which URL to index and rank
Step 2: How to Check Canonical URLs
Option 1: Manual Check
1. Open any page of your site in your browser.
2. Right-click and select “View Page Source”.
3. Search (Ctrl+F or Cmd+F) for rel=”canonical”.
4. Confirm that the canonical tag points to the correct version of the page (usually the full URL with trailing slash if your site uses it).
Example:
<link rel=”canonical” href=”https://yourdomain.com/sample-page/” />
Option 2: Use an SEO Tool
Use a tool like:
• Screaming Frog – Crawl your site and review the “Canonical” column.
• Rank Math (in WordPress) – Automatically sets canonical URLs. Go to Rank Math → Titles & Meta to confirm canonical behavior.
📌 Check for:
• Self-referencing canonicals (each page should point to itself unless there’s a specific reason otherwise)
• No canonicals pointing to staging or dev URLs
• Avoid canonicalizing to unrelated or incorrect pages
Step 3: What Is Robots.txt and Why It Matters
What Is Robots.txt?
The robots.txt file is a plain text file located at the root of your domain (yourdomain.com/robots.txt). It tells search engine bots which pages or folders they are allowed or disallowed from crawling.
Why It’s Important:
• Controls what search engines can access
• Prevents indexing of sensitive or duplicate content
• Ensures critical pages aren’t blocked from crawling
Step 4: How to Check Your Robots.txt File
1. View Your Robots.txt File
Go to:
https://yourdomain.com/robots.txt
2. Review for Unintended Blocking
Look for Disallow rules and verify nothing important (like /wp-content/, /blog/, or /products/) is being blocked unintentionally.
Example of a standard, safe setup:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yourdomain.com/sitemap_index.xml
📌 Common Mistakes to Avoid:
• Disallow: / – Blocks your entire site from being crawled
• Blocking important sections like /blog/ or /shop/
• Forgetting to remove staging or noindex directives after going live
Step 5: Final Launch Checklist
✅ Each page has a correct self-referencing canonical tag
✅ No canonicals pointing to old or dev URLs
✅ Robots.txt does not block important pages or directories
✅ Your sitemap is referenced in the robots.txt file
✅ Test with Google Search Console’s URL Inspection Tool or robots.txt Tester to ensure proper indexing
Conclusion
Reviewing your canonical tags and robots.txt file is a critical part of your pre-launch SEO checklist. These settings help search engines crawl and index your site correctly, avoid duplicate content issues, and improve your overall visibility in search results. Make sure everything is clean and correct before going live.