• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar

John-Henry T. Scherck

The ARR is not my son, but I will raise it.

  • About Me
  • Work With Me

Can robots.txt tell the difference between a slash and a dash?

November 1, 2018 by jhtscherck 3 Comments

I’ve been working with my client, Gremlin Inc, to launch a content library covering the topic of Chaos Monkey. We have a PPC page that is blocked via robots.txt using a *very* similar URL structure to one of the subpages of this asset. The only difference is a subfolder – yet Google is blocking both pages from organic search. It appears like this may be a bug with the way Google processes URLs with robots.txt, but I wanted to open this up to the community and see if anyone has dealt with something similar.

Here’s the robots.txt:

Gremlin Robots.txt

Notice line 10:

Disallow: /chaos-monkey-simian-army/

The Simian Army subpage of the guide we’ve created, has a URL that’s supposed to be indexed:

/chaos-monkey/the-simian-army/

But Google is blocking this result:

And we have to opt in for extra results just to get to it:

blocked results for gremlin's simian army page that should be indexed.

When i try to submit the page in search console, it tells me robots.txt is blocking the page:

When we test using the old Web Master Tools robots.txt tester – it says the URL should be allowed:

old WMT saying the URL is okay

So is Google allowing for a type of fuzzy match on near identical URLs when it comes to processing subfolders and dashes in URLs that are supposed to be blocked by robots.txt?

 

Please leave a comment if you’ve seen something similar.

 

END OF DAY UPDATE:

We removed line 10 from robots.txt, resubmitted robots.txt using the old Web Master Tools, used fetch + render and submitted the page. We are back in, with a description:

result showing up properly

Definitely looks like a bug from where I am sitting…

 

 

Image via Bergen Offentlige Bibliotek

Related Posts

  • Link Building for Startups - Find Unlinked Brand Mentions at Scale

    When you start working with a new client, sometimes you have to educate their internal…

Filed Under: Technical SEO BS

Reader Interactions

Comments

  1. Steven van Vessum says

    November 2, 2018 at 8:27 am

    That’s interesting John-Henry. Have you reached out to John Mueller?

    Reply
    • jhtscherck says

      November 7, 2018 at 10:27 pm

      Hey Steven, yes I did reach out to him on Twitter but I haven’t heard back. Hopefully this will be solved soon.

      Reply
  2. neeraj pandey says

    November 3, 2018 at 5:13 pm

    Interesting here is that 2 different Google tools are telling 2 different results. 1 saying it’s blocked by robots.txt and other says it’s allowed here it’s clear that 1 has bug …. I saw some kind of other bug in url inspection so I believe robots checker is correct

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Recent Posts

  • Content Marketing for B2B SaaS – Content & Conversations Interview (Video)
  • Can robots.txt tell the difference between a slash and a dash?
  • The anti-template – how I got a meeting with @Drift’s VP of Marketing with a super long, bespoke, cold email
  • Authentic Sales Presentations: How Your Team Can Win More By Being Themselves
  • How To Use & Personalize Sales Triggers to Start Meaningful Conversations

Categories

  • Citation Building
  • Content Strategy
  • Fun
  • Link Building
  • Sales
  • Scraping
  • Technical SEO BS
  • Uncategorized

Copyright © 2019 · Genesis Framework · WordPress · Log in