Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Recent Googlebot robots.txt handling - Are they pulling my leg or what?

         

Kendo

10:10 pm on Sep 22, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Am trying to get a web site back to the front page of its search results and have rebuilt it 3 times to better improve SEO and phone friendliness. Each time making it better. Also hired an SEO expert to refine things and work on its ranking. That part is working ok but for amusing results because each week we check rankings on set keywords by doing searches in India, USA and Australia and counting how far back each keyword is.What is amusing is the inconsistency ecah week because instead of moving forward at a steady pace, they are jumping back and forth like a mexican jumping bean. There is also a huge difference between the localities.

But that is not why I am posting here today.

My Adwords account had been stopped by me for sometime but it I have found it most useful in the past for tracking keywords and their usage, so I enable the Adwords account and revised the ads and landing pages. After about 2 weeks I started getting emails about "Ad disapproved" due to its landing page not working.

Well all landing pages worked perfectly and could be verified from several locations around the globe, so I contacted them and reported the anomaly. To which they replied... "Get a web designer to fix my web site".

So I sent them screenshots of my main landing page taken from different locations around the world. To which they replied... "See my network administrator to fix the problem".

So then I sent them a link to click from their end which would send me an email confirming the activity and IP address. They must have clicked that link, or someone on their end did, because their IP address was logged and that was useful in determining that we had no IP ranges blocked that would prevent Google access.

So then I sent them a screenshot taken from their very own mobile friendly testing pages. That's right... they reckon that from their workstation that they cannot reach my web site, but their site and mobile testing tools can!

Now the latest excuse is that robots.txt is blocking them. This is what was in my robots.txt...

User-agent: *
Allow: /

Nothing more and nothing less. But they reckon that it should be like this...

User-agent: Googlebot
Disallow:
User-agent: Googlebot-image
Disallow:

Are they pulling my leg or what? They are suggesting that each search engine must be specifically allowed, and that a generic allowance is no longer acceptable?

goodroi

10:08 pm on Sep 23, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



In July Google announced they were switching to a very strict interpretation of the robots.txt protocol starting on Sept 1st. In general the best SEO results usually come from giving Google what they want in the manner they want it aka optimize your site to make it friendly to search engines. The harder you make it for Google, the less likely you will be happy with how Google handles your site.

not2easy

10:22 pm on Sep 23, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I can't say for certain but some time ago when I needed to disallow a folder but allow .js and .css files within it, I needed to first "Disallow" the folder, then "Allow" the extensions for it to work as expected. Since they are making adjustments I would not be certain without revisiting their instructions, but that may be why they don't like the Allow: by itself.

tangor

10:42 pm on Sep 23, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All of this sounds like "filtering" by g to find out who will play and who is not a company worker.

Almost like union tactics. Sigh.

I have no exclusive "g" entries in my robots.txt and so far they have not ignored me ... or stopped hammering my site with endless crawls. :)

Pick and chose your battles, but battle if you must if g does not act right!

Kendo

11:51 pm on Sep 23, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Their bots are all over the site every day all day... all over all of my sites and they all had the same robots.txt. Every page in their sitemaps has been spidered and then some more which they could only find by using an index that was years old. It was their ad-checker that failed. But ad support people also complained that the site was unreachable via their mobile phone, so I think that fun and games was at play also.

tangor

11:58 pm on Sep 23, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Kendo ... all you can do is be the best you can be. What g does is in g ... and you have no way to change them.

The chuckles continue as g changes the game over and over and over and over and over and over and over.... (someone stop me!) over and over and ...

RedBar

12:57 am on Sep 24, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Honestly, do you really expect the juveniles at G to know what they're doing ... of course, this is ASSUMING any response you get from them is from a human and not some sort of pathetic AI .!.!.!

Hahahaha .. The web used to be so much easier before Google decided to "organise" it :-(

not2easy

4:34 am on Sep 24, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



In case it is important enough to look into, it has not changed from the Google instructions [developers.google.com] for robots.txt syntax that I had described above.
At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry trumps the less specific (shorter) rule. In case of conflicting rules, including those with wildcards, the least restrictive rule is used.
where they show the example (with formatting that makes it more readable):
http://example.com/page.htm
    allow: /$
    disallow: /
    Verdict: disallow


Also note that on July 1, 2019 some things were changed in how Google uses robots.txt. The part of the article where specific user-agents protocols is listed is:
https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents
(pasted rather than linked because the '#' breaks our links).