Welcome to WebmasterWorld Guest from 54.198.52.8

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

Googlebot cannot access JS and CSS files/ Google Warning

     
3:03 pm on Jul 28, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:659
votes: 8


Hi I got this message from Google Search Console (webmaster tools):

====
Google systems have recently detected an issue with your homepage that affects how well our algorithms render and index your content. Specifically, Googlebot cannot access your JavaScript and/or CSS files because of restrictions in your robots.txt file

Further the message says:

Use the "Fetch as Google" feature to identify those resources that robots.txt directives are blocking.
====

I tried entering the url (homepage) into fetch as google page, but where do I look for resources that robots.txt directives are blocking?

I need to find out what is being blocked by robots.txt that is causing this error message. Kindly assist. Thanks!
5:49 pm on July 28, 2015 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11902
votes: 294


born2run - There was a SER post by Barry Schwartz this morning which discusses that these warnings have been going around, and it appears to cover your question. Google has been saying for years not to block your CSS and Javascript files, and while you may think you've complied, it turns out that on some CMS systems you may also be blocking include files by default.

The SERoundtable post is....

New Google Warning: Googlebot Cannot Access CSS & JS
Jul 28, 2015 - by Barry Schwartz
[seroundtable.com...]

Possibly relevant to your situation....
This is not a penalty notification, but a warning that if Google cannot see your whole site, it may result in poorer rankings.

If you get this message, talk to your developers and discuss what you can do, if you need to do anything. Use the fetch and render tool to diagnosis the issue deeper as well.

And, an important update that Barry added...
Update: I should add, that many many WordPress sites are getting this notification because their /wp-includes/ folder is blocked by robots.txt. Plus there are many popular CMS solutions that block their include files by default.

Please let us know what fixes it.
6:22 pm on July 28, 2015 (gmt 0)

New User from GB 

joined:Feb 27, 2015
posts: 27
votes: 1


Every Wordpress site on our server has just got this message. It almost seems like a veiled threat from Google.
6:50 pm on July 28, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 7, 2009
posts: 61
votes: 0


Fetch as Google -> FETCH AND RENDER -> at the bottom of the report you'll see resources blocked by robots.txt or similar problems.
As you may know, you can allow a subset of a restricted area with the "Allow" directive.
e.g.:
Disallow: /wp-content/
Allow: /wp-content/uploads/

This will block everything under wp-content directory except from upload subdirectory.
7:00 pm on July 28, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3451
votes: 181


I got this warning back in April when I switched themes to responsive. I edited the robots.txt file to give Google access to the files they want to see. They do not get wide open access, they complain of not being allowed to crawl the captcha generator scripts. Sorry but that will not happen.. What I did change:

Disallow: /wp-content/plugins/
to
Disallow: /wp-content/plugins/
Allow: /wp-content/plugins/theme-name/

I am still getting the message, but now under blocked resources, it is only their very own AdSense that is listed. They should fix their own robots.txt.

I would note that afaik only Google uses the Allow: syntax. So Bing, et al would need to develop something for their bots.
7:23 pm on July 28, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14036
votes: 522


Another is analytics. If you use GA, all the files live on their own servers and they make the rules. But if you use something like piwik that lives on your own site, then ### right I'm not going to let anyone crawl the script; the file is huge and has zero effect on page display and they know it. And you're not going to tell me the googlebot doesn't recognize a wp URL when it sees one. (Query: How does it even know /wp-includes/ exists? That's not something you link to, is it? Is it just guessing, like all those Ukrainians asking for wp-admin?)
11:49 pm on July 28, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:659
votes: 8


Yeah thanks guys I set the disallow: all and allow: specific folders, and things are fine now according to fetch and render as google.

It's pretty odd they are flagging it now after so many days... anyhoo for now things are ok. Thanks again.
5:02 pm on July 29, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Barry posted about this again today [seroundtable.com...] and has the following "quick fix" solution listed from Gary Illyes:

User-Agent: Googlebot
Allow: .js
Allow: .css

There were some comments there about it not working, but reading how Google handles robots.txt specifications, here: [developers.google.com...] it will work:

Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of records by finding the group with the most specific user-agent that still matches. All other groups of records are ignored by the crawler. The user-agent is non-case-sensitive. All non-matching text is ignored (for example, both googlebot/1.2 and googlebot* are equivalent to googlebot). The order of the groups within the robots.txt file is irrelevant.

-- From about 2/3 of the way down the page.

Sample situations:
URL                            allow:     disallow:     Verdict     Comments

http://example.com/page             /p         /          allow

http://example.com/folder/page     /folder/      /folder       allow

http://example.com/page.htm       /page       /*.htm      undefined

http://example.com/                 /$         /          allow

http://example.com/page.htm         /$         /         disallow


-- From almost at the bottom of the page.

So, not only will the more specific Googlebot group-record remove a generic block (user-agent: *), the more specific Allow: .js & Allow: .css should remove the blocks from the specific files much the same as the Allow: /folder/ will "override" the Disallow: /folder
12:53 pm on July 31, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 7, 2009
posts: 61
votes: 0


I tried both


Allow: .js
Allow: .css


AND

Allow: /*.js
Allow: /*.css


None of them works, googlebot robots.txt tester show css and js files blocked even if using this syntax.
1:19 pm on July 31, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Interesting...

1.) Did you put those under a user-agent: googlebot grouping?

2.) If you did, have you checked server logs to make sure the tester is aligned with what they do and they're really not visiting?

I'm wondering mostly because Gary Illyes says it works, and their own docs are wrong to say they go by the longer of the file-paths (.js/.css) is implicitly anything up until .js or .css, so for them to not be accessing them and their docs to be accurate, someone would have to have the full-path to the .js or .css blocked, then trying to allow it. (Similar to their example of "undefined" above.)

EG
Disallow: /some-directory/another-directory/some-page
Allow: .js

Where the .js file is located at /some-directory/another-directory/some-page.js or something along those lines.

Anyway, if anyone can provide more info it would probably be good/helpful for others to know more about what does and doesn't work for sure.

[I don't have time or the patients to go test it right now, but maybe someone else will.
1:21 pm on July 31, 2015 (gmt 0)

New User from GB 

joined:Feb 27, 2015
posts: 27
votes: 1


I poster on Barry's article that this is something for Google to pick up with Wordpress (although I realise that it's not only WP that has this problem). If Google specifically can't get into WP and they make up a claimed 24% of all the web then frankly this is not something that most WP website owners will have any clue about. Anything short of a structural fix will cause huge issues.

As not2easy noted above "They do not get wide open access". No need for it at all....
1:29 pm on July 31, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 7, 2009
posts: 61
votes: 0


@TheMadScientist I didn't add user-agent: googlebot , thank you.

I confirm this works:
User-Agent: Googlebot
Allow: .js
Allow: .css


And this doesn't work:
User-Agent: *
Allow: .js
Allow: .css
1:35 pm on July 31, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Cool -- Thanks for checking it out and sharing the info!
6:48 pm on July 31, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14036
votes: 522


this doesn't work

By the robots.txt protocol, each robot only reads one set of rules. If it finds something addressed to that robot by name, like Googlebot, then it doesn't read rules for * ("all others", not "everyone all the time"). Think of * as the ELSE in an IF loop.