Cache renders excluded CSS?

Forum Moderators: goodroi

Message Too Old, No Replies

Cache renders excluded CSS?

glengara

12:59 pm on May 22, 2006 (gmt 0)

If a pages' style elements are in an external file that excludes Gbot via robots.txt, how would you expect the cache to render the page?

Receptional Andy

1:04 pm on May 22, 2006 (gmt 0)

The cache includes a base href tag which means the CSS will be called by the client when they view the cache.

glengara

4:09 pm on May 22, 2006 (gmt 0)

So the cache doesn't give a Gbot view?

Receptional Andy

8:30 am on May 23, 2006 (gmt 0)

Well, it depends what that means. The HTML will be exactly as googlebot retreived it (plus their branded bit at the top), however to try to make the page display as close to the original as possible when visitors look at the cache, Google add in the base href line. So, while the HTML content of the page is the same, external images, scripts and CSS referenced in the HTML will be called from the site's web server as normal (in most cases, base href isn't perfect).

The 'cached text' link will give a closer idea of how google 'sees' the page.

Bear in mind that Google doesn't actually 'see' a page, rather it interprets the HTML code used. Googlebot doesn't actually render pages at all to do this, so CSS is largely irrelevant.

Of course, the ability to view a page as closely as possible to a visitor's experience would be very useful to Google, so who's to say where they are up to on attempting this?

glengara

10:20 am on May 23, 2006 (gmt 0)

"So, while the HTML content of the page is the same, external images, scripts and CSS referenced in the HTML will be called from the site's web server as normal (in most cases, base href isn't perfect)."

So is there a logical reason to use robots.txt to exclude Gbot that doesn't include the possibility of "customised" CSS files? ;-)

Receptional Andy

10:26 am on May 23, 2006 (gmt 0)

So is there a logical reason to use robots.txt to exclude Gbot that doesn't include the possibility of "customised" CSS files? ;-)

Well, the most appealing reason is that serving CSS (javascript, images etc) to spiders is wasted bandwidth.

Of course, there's always the possibility that the search engines might try to interpret CSS to check for spamming, in which case blocking access to such files could be deemed 'suspicious'.

My opinion is that CSS, javascript or indeed almost any externally linked files are not for spiders, although if you're a cautious type them I wouldn't exclude them.

glengara

7:37 pm on May 23, 2006 (gmt 0)

Thanks Andy, so if I understand it, Gbot spidering won't get the CSS file, but a cache request from G will?

Receptional Andy

10:16 am on May 24, 2006 (gmt 0)

That's right (although IIRC I did hear some reports that Googlebot was retrieving excluded files, just not displaying them in results - but I don't know if this is true or not).

So when gbot visits, it checks robots.txt; if it is excluded from example.css (or a directory containing a stylesheet) then it shouldn't retrieve the file.

When a searcher clicks on the cache of example.com, the base href tag instructs their client to prefix all relative links with http//www.example.com/ so the CSS would be called just as if they visited www.example.com directly.

So in this example, at no point does Google view the content of the css file, although for someone viewing the cache, their browser will download the css in order to render the page.