Welcome to WebmasterWorld Guest from 18.205.176.85

Forum Moderators: Robert Charlton & goodroi

Recent Googlebot robots.txt handling - Are they pulling my leg or what?

     
10:10 pm on Sep 22, 2019 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts:481
votes: 22


Am trying to get a web site back to the front page of its search results and have rebuilt it 3 times to better improve SEO and phone friendliness. Each time making it better. Also hired an SEO expert to refine things and work on its ranking. That part is working ok but for amusing results because each week we check rankings on set keywords by doing searches in India, USA and Australia and counting how far back each keyword is.What is amusing is the inconsistency ecah week because instead of moving forward at a steady pace, they are jumping back and forth like a mexican jumping bean. There is also a huge difference between the localities.

But that is not why I am posting here today.

My Adwords account had been stopped by me for sometime but it I have found it most useful in the past for tracking keywords and their usage, so I enable the Adwords account and revised the ads and landing pages. After about 2 weeks I started getting emails about "Ad disapproved" due to its landing page not working.

Well all landing pages worked perfectly and could be verified from several locations around the globe, so I contacted them and reported the anomaly. To which they replied... "Get a web designer to fix my web site".

So I sent them screenshots of my main landing page taken from different locations around the world. To which they replied... "See my network administrator to fix the problem".

So then I sent them a link to click from their end which would send me an email confirming the activity and IP address. They must have clicked that link, or someone on their end did, because their IP address was logged and that was useful in determining that we had no IP ranges blocked that would prevent Google access.

So then I sent them a screenshot taken from their very own mobile friendly testing pages. That's right... they reckon that from their workstation that they cannot reach my web site, but their site and mobile testing tools can!

Now the latest excuse is that robots.txt is blocking them. This is what was in my robots.txt...

User-agent: *
Allow: /

Nothing more and nothing less. But they reckon that it should be like this...

User-agent: Googlebot
Disallow:
User-agent: Googlebot-image
Disallow:

Are they pulling my leg or what? They are suggesting that each search engine must be specifically allowed, and that a generic allowance is no longer acceptable?
10:08 pm on Sept 23, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3531
votes: 400


In July Google announced they were switching to a very strict interpretation of the robots.txt protocol starting on Sept 1st. In general the best SEO results usually come from giving Google what they want in the manner they want it aka optimize your site to make it friendly to search engines. The harder you make it for Google, the less likely you will be happy with how Google handles your site.
10:22 pm on Sept 23, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4569
votes: 367


I can't say for certain but some time ago when I needed to disallow a folder but allow .js and .css files within it, I needed to first "Disallow" the folder, then "Allow" the extensions for it to work as expected. Since they are making adjustments I would not be certain without revisiting their instructions, but that may be why they don't like the Allow: by itself.
10:42 pm on Sept 23, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10606
votes: 1128


All of this sounds like "filtering" by g to find out who will play and who is not a company worker.

Almost like union tactics. Sigh.

I have no exclusive "g" entries in my robots.txt and so far they have not ignored me ... or stopped hammering my site with endless crawls. :)

Pick and chose your battles, but battle if you must if g does not act right!
11:51 pm on Sept 23, 2019 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts:481
votes: 22


Their bots are all over the site every day all day... all over all of my sites and they all had the same robots.txt. Every page in their sitemaps has been spidered and then some more which they could only find by using an index that was years old. It was their ad-checker that failed. But ad support people also complained that the site was unreachable via their mobile phone, so I think that fun and games was at play also.
11:58 pm on Sept 23, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10606
votes: 1128


@Kendo ... all you can do is be the best you can be. What g does is in g ... and you have no way to change them.

The chuckles continue as g changes the game over and over and over and over and over and over and over.... (someone stop me!) over and over and ...
12:57 am on Sept 24, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member redbar is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Oct 14, 2013
posts:3371
votes: 564


Honestly, do you really expect the juveniles at G to know what they're doing ... of course, this is ASSUMING any response you get from them is from a human and not some sort of pathetic AI .!.!.!

Hahahaha .. The web used to be so much easier before Google decided to "organise" it :-(
4:34 am on Sept 24, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4569
votes: 367


In case it is important enough to look into, it has not changed from the Google instructions [developers.google.com] for robots.txt syntax that I had described above.
At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry trumps the less specific (shorter) rule. In case of conflicting rules, including those with wildcards, the least restrictive rule is used.
where they show the example (with formatting that makes it more readable):
http://example.com/page.htm
    allow: /$
    disallow: /
    Verdict: disallow


Also note that on July 1, 2019 some things were changed in how Google uses robots.txt. The part of the article where specific user-agents protocols is listed is:
https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents
(pasted rather than linked because the '#' breaks our links).