I've been working with my client, Gremlin Inc, to launch a content library covering the topic of Chaos Monkey. We have a PPC page that is blocked via robots.txt using a *very* similar URL structure to one of the subpages of this asset. The only difference is a subfolder – yet Google is blocking both pages from organic search. It appears like this may be a bug with the way Google processes URLs with robots.txt, but I wanted to open this up to the community and see if anyone has dealt with something similar.
Here’s the robots.txt:
Notice line 10:
Disallow: /chaos-monkey-simian-army/
The Simian Army subpage of the guide we’ve created, has a URL that’s supposed to be indexed:
/chaos-monkey/the-simian-army/
But Google is blocking this result:
And we have to opt in for extra results just to get to it:
When i try to submit the page in search console, it tells me robots.txt is blocking the page:
When we test using the old Web Master Tools robots.txt tester – it says the URL should be allowed:
So is Google allowing for a type of fuzzy match on near identical URLs when it comes to processing subfolders and dashes in URLs that are supposed to be blocked by robots.txt?
Please leave a comment if you’ve seen something similar.
END OF DAY UPDATE:
We removed line 10 from robots.txt, resubmitted robots.txt using the old Web Master Tools, used fetch + render and submitted the page. We are back in, with a description:
Definitely looks like a bug from where I am sitting…
Featured image via Bergen Offentlige Bibliotek