I'm just doing some tests, unblocking all JS and CSS that render my pages.
I have read some related complaints in comments on the Yoast article that people who are using Google Web Fonts are seeing Google messages about the blocked css when the only blocking is on the Web Fonts site itself. I don't use their Web Fonts, but I would probably check into that if I did.
This sounds both sensible and easy to do.
But before I rush in and remove 'templates' from my robots.txt which would enable googlebot to access the css files and render the page better, can anyone see any downside or risks to doing this or are there only potential benefits?
For example, could a site that is not affected by 'above the fold' penalties or is not well designed for mobiles suddenly find it is exposing its shortcomings which were previously not visible? (I don't think these two apply to me, but wonder if there are other risks)
What reasons would a webmaster have to block such files from Google?
Some people block everything that they don't think Google has a good reason to see.
Personally, I don't know what the problem would be for Google to see the CSS file I'm using.
|What reasons would a webmaster have to block such files from Google? |
I am blocking .js files. The reason is that certain links create URL with date parameter. This creates many URLs that are the same, the only difference being the date parameter in URL.
I know that I could use parameter handling in WMT or robots.txt to block such URLs, but it seemed a cleaner solution not to even let Google know such URLs exist.
In fact they are also blocked by robots in case Google pick up some of them via different means (e.g. external link), but blocking these URLs by creating the link in .js which is blocked from Google does not expose them to Google in the first place. Otherwise, every time Google crawled the page, it may see a different URL.
I may decide to unblock it on one of sites and see if there are any impacts on traffic/ranking.
Blocking JS and CSS related to the style of a website might be a problem though, for the reasons told.
The irony is that many people probably blocked access to avoid having Panda count stylesheets and scripts as "thin content".
I wonder if marking the context as "noindex" with an X-Robots-Tag in the http header has the same adverse affect. It's not blocking it per se, but it wouldn't be in the cache, if that's where Panda was expecting it to be.
|to avoid having Panda count stylesheets and scripts as "thin content". |
Oh, come on. Google's mind may not work like yours and mine, but they know a stylesheet when they see one.
They spy on everything, lol do you really think you can block spying with robots.txt come on
They even know what you eat, so forget about robots.txt
funny...I am actually seeing an increase in traffic after Panda 4 and everything within wp-content folder is blocked by robots.txt. I don't see any relationship between panda 4 and the css/js blocking.
|do you really think you can block spying with robots.txt |
As far as anyone knows, the Googlebot's visits are plainly visible in your site logs. Now, if you wanted to venture into tinfoil-helmet territory, you could postulate that they're paying off the major hosts to have certain things omitted from logs. Heh, heh. But this kinda falls apart if it's your own server.
@indyank, I'm in a similar position with joomla which also blocks the relevant directories by default.
The link with panda 4 seems tenuous but it's not impossible this css/js blocking could explain other smaller changes that take place from time to time for example in mobile search. Have you considered removing the directories from robots.txt anyway to see if, for example, the mobile version of your site gets better visitor numbers?
Why would it be a Panda-related issue if it's more to do with layout? If they can't load the css file(s), they can't determine if your site is "top heavy" with ads, or the text colour of the block of text at the bottom of your page (big difference if it's white on white background, or black on white background!) - they want to see the page as a human sees it (i.e. they need to load the css to render the page as a human would see it).
They have removed the yoast article, but it is still in the cache. I wonder why it was removed.
I've just tried using fetch and render on Webmaster Tools. It gave the URL a "partical" pass.
It says that fonts.google and two of Google's Adsense .js files were blocked by robots.txt
Whose robots.txt blocked it? It wasn't mine, as I do not have anything blocked in my robots.txt file.
Any ideas? and if is something to be concerned about?
|Oh, come on. Google's mind may not work like yours and mine, but they know a stylesheet when they see one. |
They know a tag page when they see one as well, yet there's 1001+ wordpress plugins to noindex it so Panda won't count it as duplicate content.
|to avoid having Panda count stylesheets and scripts as "thin content". |
You might want to delete your robots.txt file, too, if it's only a few lines long. :-)
|It says that fonts.google and two of Google's Adsense .js files were blocked by robots.txt |
Whose robots.txt blocked it?
Didn't this come up in an earlier thread? Honestly, google, how hard would it be to give a pass to your own robot in your own robots.txt? :)
Don't all the standard cms "themes" come with standard named css? So even if a search engine can't crawl them, it pretty well knows what's there. That's assuming for the sake of discussion that there aren't vast numbers of sites pretending to use a standard CMS by making the html look just like WordPress or Joomla, and intentionally giving their personal stylesheets conventional names, just to pull the wool over g###'s eyes.
|You might want to delete your robots.txt file, too, if it's only a few lines long. |
Header set X-Robots-Tag "noindex"
Not because of thin content but because I don't trust them not to index it otherwise. The only reason css is exempt is that it's in a different envelope.
Getting back to the topic of the Yoast article, it helps to read the article and to note the particular types of pages this problems appears to have affected.
As Joost describes the page he used as an example....
|Now, iPhoned makes money from ads. It doesn't have a ridiculous amount of them, but because it uses an ad network a fair amount of scripts and pixels get loaded. My hypothesis was: if Google is unable to render the CSS and JS, it can't determine where the ads on your page are. In iPhoned's case, it couldn't render the CSS and JS because they were accidentally blocked in their robots.txt after a server migration. |
As I check the page linked to from the article, the page doesn't look like it should trigger an above-the-fold ad algo; but looking at the (only partial) screen capture of how Fetch as Googlebot saw the page... and also looking at the page source code... I can see how blocking CSS and JS in this case could well cause Google to misinterpret the page.
The article also quotes Maile Ohye's recent comments regarding how Google uses the new Fetch and render capability....
|We recommend making sure Googlebot can access any embedded resource that meaningfully contributes to your site's visible content or its layout |
If this is part of Panda 4... and we're not sure of that but it does come nearby in time... how does this mesh with the "kindler, gentler" aspects proclaimed for the algo? I'm guessing that it may be kindler and gentler because it's correcting for an ad layout that wasn't visibly distracting and shouldn't have been hit... but it can only do this if it "sees" the layout, and for that the bot needs to be able to see CSS and JS.
IMO, Google's "rendering" is there to make a finer discrimination, not to screw you. YMMV.
Does this mean that all JS should be open to Googlebot? In my book, no. But if you're in this kind of situation where you've been hit by a large drop and have a layout that might look better to an above-the-fold algo if the page were rendered properly, you might want to unblock the code affecting potentially applicable areas.
I block my CSS and JS, I got 150% rise from panda 4. Its more likely page layout algo which im sure runs constantly, so a good crawl and away you go.
However, my page renders similar and its booming at the minute so ive no intention of changing.
As for why block it, originally it was to block google know what plugins I used for SEO..Google cause these issue by keeping us in the unknown.
We lost 30% of our traffic on panda 4 across multiple language sites. All sites have very similar layout. Approximately 20% of the above the fold content is taken up by a google maps image using their js plugin. I tried running an example webpage through the "fetch and render" and the map is completely missing and there is just the background colour without any content or image. The google maps addin apparently blocks googlebot on its robot.txt. Not sure if people have had similar issues with google maps.
And the map not showing looks even worse on the mobile version of our site.
I say again - why are people confusing css with content? Panda is about content, no? So why are we saying if you block .css files, you run the risk of a Google content filter (Panda) punishing you? It's far more likely to be a layout related issue .... because css.... is layout....! And if you deny Google the chance to see not only your actual layout, but text colours and sizes, then of course Google aren't going to trust your site as much. How can they know your text colour for a block of text on a page isn't white on white if they can't render the css?
Hmmm. I just did a GWT "Fetch and render" of the home page, and received this message:
1. http://translate.google.com/translate_a/element.js?cb=googleTranslateElementInit - Script Denied by robots.txt
2. http://pagead2.googlesyndication.com/pagead/show_ads.js - Script Denied by robots.txt [Google AdSense]
3. http://edge.quantserve.com/quant.js - Script Denied by robots.txt
5. http://www.example.com/logo2.jpg - Image Denied by robots.txt
6. http://www.example.com/ArrowOrange.gif - Image Denied by robots.txt
So, how does one allow Googlebot access? Why is this suddenly a problem, particularly 1 & 2? All the others were image files.
Is this really a problem? GWT otherwise reports no fetch problems.
[edited by: aakk9999 at 10:42 pm (utc) on Jun 21, 2014]
[edit reason] Unlinked sample URLs [/edit]
The issue is Googlebot cant see the content most humans see on the web. If this fetch and render thing forms the basis of panda 4 then its clearly quite flawed.
@IanCP, you are seeing exactly what others have complained about - it isn't your robots.txt that is blocking access, it is their own.
Are we really sure what Panda 4 is? Reason why I ask is that a few sites I manage (for others) which do not use either CSS or JS saw a drop on the roll out of Panda 4...
Inquiring minds want to know. Rather than speculate and chase unsubstantiated ghosties.
| This 43 message thread spans 2 pages: 43 (  2 ) > > |