Googlebot cannot access JS and CSS files/ Google Warning

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot cannot access JS and CSS files/ Google Warning

born2run

3:03 pm on Jul 28, 2015 (gmt 0)

Hi I got this message from Google Search Console (webmaster tools):

====
Google systems have recently detected an issue with your homepage that affects how well our algorithms render and index your content. Specifically, Googlebot cannot access your JavaScript and/or CSS files because of restrictions in your robots.txt file

Further the message says:

Use the "Fetch as Google" feature to identify those resources that robots.txt directives are blocking.
====

I tried entering the url (homepage) into fetch as google page, but where do I look for resources that robots.txt directives are blocking?

I need to find out what is being blocked by robots.txt that is causing this error message. Kindly assist. Thanks!

Robert Charlton

5:49 pm on Jul 28, 2015 (gmt 0)

born2run - There was a SER post by Barry Schwartz this morning which discusses that these warnings have been going around, and it appears to cover your question. Google has been saying for years not to block your CSS and Javascript files, and while you may think you've complied, it turns out that on some CMS systems you may also be blocking include files by default.

The SERoundtable post is....

New Google Warning: Googlebot Cannot Access CSS & JS
Jul 28, 2015 - by Barry Schwartz
[seroundtable.com...]

Possibly relevant to your situation....

This is not a penalty notification, but a warning that if Google cannot see your whole site, it may result in poorer rankings.

If you get this message, talk to your developers and discuss what you can do, if you need to do anything. Use the fetch and render tool to diagnosis the issue deeper as well.

And, an important update that Barry added...

Update: I should add, that many many WordPress sites are getting this notification because their /wp-includes/ folder is blocked by robots.txt. Plus there are many popular CMS solutions that block their include files by default.

Please let us know what fixes it.

Barbados

6:22 pm on Jul 28, 2015 (gmt 0)

Every Wordpress site on our server has just got this message. It almost seems like a veiled threat from Google.

teokolo

6:50 pm on Jul 28, 2015 (gmt 0)

Fetch as Google -> FETCH AND RENDER -> at the bottom of the report you'll see resources blocked by robots.txt or similar problems.
As you may know, you can allow a subset of a restricted area with the "Allow" directive.
e.g.:

Disallow: /wp-content/
Allow: /wp-content/uploads/

This will block everything under wp-content directory except from upload subdirectory.

not2easy

7:00 pm on Jul 28, 2015 (gmt 0)

I got this warning back in April when I switched themes to responsive. I edited the robots.txt file to give Google access to the files they want to see. They do not get wide open access, they complain of not being allowed to crawl the captcha generator scripts. Sorry but that will not happen.. What I did change:

Disallow: /wp-content/plugins/
to
Disallow: /wp-content/plugins/
Allow: /wp-content/plugins/theme-name/

I am still getting the message, but now under blocked resources, it is only their very own AdSense that is listed. They should fix their own robots.txt.

I would note that afaik only Google uses the Allow: syntax. So Bing, et al would need to develop something for their bots.

lucy24

7:23 pm on Jul 28, 2015 (gmt 0)

Another is analytics. If you use GA, all the files live on their own servers and they make the rules. But if you use something like piwik that lives on your own site, then ### right I'm not going to let anyone crawl the script; the file is huge and has zero effect on page display and they know it. And you're not going to tell me the googlebot doesn't recognize a wp URL when it sees one. (Query: How does it even know /wp-includes/ exists? That's not something you link to, is it? Is it just guessing, like all those Ukrainians asking for wp-admin?)

born2run

11:49 pm on Jul 28, 2015 (gmt 0)

Yeah thanks guys I set the disallow: all and allow: specific folders, and things are fine now according to fetch and render as google.

It's pretty odd they are flagging it now after so many days... anyhoo for now things are ok. Thanks again.

TheMadScientist

5:02 pm on Jul 29, 2015 (gmt 0)

Barry posted about this again today [seroundtable.com...] and has the following "quick fix" solution listed from Gary Illyes:

User-Agent: Googlebot
Allow: .js
Allow: .css

There were some comments there about it not working, but reading how Google handles robots.txt specifications, here: [developers.google.com...] it will work:

Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of records by finding the group with the most specific user-agent that still matches. All other groups of records are ignored by the crawler. The user-agent is non-case-sensitive. All non-matching text is ignored (for example, both googlebot/1.2 and googlebot* are equivalent to googlebot). The order of the groups within the robots.txt file is irrelevant.

-- From about 2/3 of the way down the page.

Sample situations:
URL                            allow:     disallow:     Verdict     Comments
http://example.com/page             /p         /          allow
http://example.com/folder/page     /folder/      /folder       allow
http://example.com/page.htm       /page       /*.htm      undefined
http://example.com/                 /$         /          allow
http://example.com/page.htm         /$         /         disallow
-- From almost at the bottom of the page.

So, not only will the more specific Googlebot group-record remove a generic block (user-agent: *), the more specific Allow: .js & Allow: .css should remove the blocks from the specific files much the same as the Allow: /folder/ will "override" the Disallow: /folder

teokolo

12:53 pm on Jul 31, 2015 (gmt 0)

I tried both


Allow: .js
Allow: .css

AND

Allow: /*.js
Allow: /*.css

None of them works, googlebot robots.txt tester show css and js files blocked even if using this syntax.

TheMadScientist

1:19 pm on Jul 31, 2015 (gmt 0)

Interesting...

1.) Did you put those under a user-agent: googlebot grouping?

2.) If you did, have you checked server logs to make sure the tester is aligned with what they do and they're really not visiting?

I'm wondering mostly because Gary Illyes says it works, and their own docs are wrong to say they go by the longer of the file-paths (.js/.css) is implicitly anything up until .js or .css, so for them to not be accessing them and their docs to be accurate, someone would have to have the full-path to the .js or .css blocked, then trying to allow it. (Similar to their example of "undefined" above.)

EG
Disallow: /some-directory/another-directory/some-page
Allow: .js

Where the .js file is located at /some-directory/another-directory/some-page.js or something along those lines.

Anyway, if anyone can provide more info it would probably be good/helpful for others to know more about what does and doesn't work for sure.

[I don't have time or the patients to go test it right now, but maybe someone else will.

Barbados

1:21 pm on Jul 31, 2015 (gmt 0)

I poster on Barry's article that this is something for Google to pick up with Wordpress (although I realise that it's not only WP that has this problem). If Google specifically can't get into WP and they make up a claimed 24% of all the web then frankly this is not something that most WP website owners will have any clue about. Anything short of a structural fix will cause huge issues.

As not2easy noted above "They do not get wide open access". No need for it at all....

teokolo

1:29 pm on Jul 31, 2015 (gmt 0)

@TheMadScientist I didn't add user-agent: googlebot , thank you.

I confirm this works:

User-Agent: Googlebot
Allow: .js
Allow: .css

And this doesn't work:

User-Agent: *
Allow: .js
Allow: .css

TheMadScientist

1:35 pm on Jul 31, 2015 (gmt 0)

Cool -- Thanks for checking it out and sharing the info!

lucy24

6:48 pm on Jul 31, 2015 (gmt 0)

this doesn't work

By the robots.txt protocol, each robot only reads one set of rules. If it finds something addressed to that robot by name, like Googlebot, then it doesn't read rules for * ("all others", not "everyone all the time"). Think of * as the ELSE in an IF loop.