homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 103 message thread spans 4 pages: < < 103 ( 1 2 3 [4]     
Matt Cutts asks webmasters: let googlebot crawl js and css
tedster




msg:4433747
 12:55 am on Mar 27, 2012 (gmt 0)

In a new video "public service announcement" Matt Cutts asks webmasters to remove robots.txt disallow rules for js and css files. He says that Google will understand your pages better and be able to rank you more appropriately.

[youtube.com...]

 

enigma1




msg:4440177
 8:00 pm on Apr 12, 2012 (gmt 0)

The bot knows of the existence of the URL, but (theoretically) not the contents.

That only applies if the bot access the robots.txt and then crawls some pages. But it doesn't apply when the bot gets redirected or finds a link to that page from another site, in which case it may access it directly. At least in the past when I tested it the bot did access the contents. If they would follow the robots.txt strictly I would expect them to access it before every page access since that's another file that can change at anytime.

Andy Langton




msg:4440179
 8:06 pm on Apr 12, 2012 (gmt 0)

But it doesn't apply when the bot gets redirected or finds a link to that page from another site, in which case it may access it directly


Not quite - it won't apply if spiders have not retrieved the robots.txt file at all, or if the version they have is out of date. But the method of discovery of the URL makes no difference. The spider should check all references to a URL against the corresponding robots.txt file. I believe Google always asks for robots.txt first before requesting URLs from a new site.

Final comment - spiders from the major search engines normally "queue" URLs for spidering as they discover them, rather than immediately spider discovered URLs, whether a redirect or otherwise. So, once they're in the queue, they're handled alongside any other URLs for that site, and again the discovery mechanism makes no difference.

jmccormac




msg:4440187
 8:21 pm on Apr 12, 2012 (gmt 0)

How is that supposed to work?

@rlange
Google can properly render the page in Preview with snippets so that the user never has to leave Google's page. If I was being ultra cynical, I'd expect them to start promoting some kind of zoom-in function on that. Of course I may be wrong.

Regards...jmcc

lucy24




msg:4440250
 10:48 pm on Apr 12, 2012 (gmt 0)

when the bot gets redirected

But bots don't "get redirected" like humans. They can choose not to follow the redirect. Back when gwt listed the names of pages that were blocked by robots.txt, I always had a fistful of titles that it only knew about through redirects from non-blocked (because non-existent) pages.

That's assuming you give the googlebot 24 hours or so to assimilate a new robots.txt before putting any redirects in place.

enigma1




msg:4440372
 8:56 am on Apr 13, 2012 (gmt 0)

They can choose not to follow the redirect.

They will follow, give them some bait, for example append some parameters and give them something that looks like a "new link" to speed things up. And you can test it using a couple of different domains.

Alex997




msg:4443042
 3:46 pm on Apr 19, 2012 (gmt 0)

I unblocked my css and js files a week or so ago now and all my SERPs have suddenly started to slide! - I am not using any custom blackhat, just WordPress.

OK so my WMT link count has gone down too in this time - but I can't help feeling that this change may have something to do with it.

I will watch the next couple of weeks very carefully.

Robert Charlton




msg:4443265
 6:03 am on Apr 20, 2012 (gmt 0)

I unblocked my css and js files a week or so ago now and all my SERPs have suddenly started to slide! - I am not using any custom blackhat, just WordPress.

OK so my WMT link count has gone down too in this time - but I can't help feeling that this change may have something to do with it.

Maybe your link partners didn't want to have anything to do with a site with spiderable css and decided to delete their backlinks to you. ;)

Sounds more like coincidence to me. Good luck.

diberry




msg:4447263
 6:20 pm on Apr 29, 2012 (gmt 0)

I agree with Martinibuster. If we're blocking these crawls to improve user experience, but then being asked to unblock them for Google's benefit, Google is sending really mixed messages. It's kind of, "Forget the algo, just build your site for visitors - by which we mean, build your site the way WE would build it if it was ours."

Given what Google did to Yelp (http://techcrunch.com/2010/08/26/google-places-yelp-stoppelman-awkward/), I'm sure not going to trust them to treat my little sites even half that well.

Frankly, I'd sooner just block Google bot entirely.

lucy24




msg:4447298
 9:49 pm on Apr 29, 2012 (gmt 0)

Unanswerable question:

Do g### and other search engines sincerely believe that a significant number of users have secret php functions disguised behind the name of common analytics packages like piwik or-- perish the thought-- GA itself? Are they correct in this belief?

If no, then stay the ### out. If yes, let's figure out a solution that doesn't involve handing out 403s to well-established search engines.

And, sorry, google, but I've found by direct experience that images in certain directories are used solely as hotlink fodder. You're free to show them with the page in Translate or Preview, but I don't want them showing up in Image Search.

lucy24




msg:4447840
 7:57 am on May 1, 2012 (gmt 0)

Follow-up: I had a belated "D'oh!" moment. It occured to me that even when g### is allowed to help itself to css and js, that doesn't tell it what a page looks like. It only tells it what the page looks like to google.

I double-checked with something I've looked at before. I have a couple of pages whose text content depends partly on whether the user has one specific font installed. (Yes, there's a direct relationship between interest in the page and likelihood of having the font.) This is done with a quick bit of javascript. On one page you can see it at any scale because it shows up in the higher-level headers.

If I look at the page with Google Preview, I don't see the page as I myself would see it. I see it only the way Google sees it.

... and then I detour into severe puzzlement because I can't for the life of me figure out where they got the Preview from. Usually it shows up in logs. This time it didn't-- and if I can believe Spotlight, the last time a human previewed this page was over a month ago. They don't cache previews that long. Even translations-- text alone-- are only kept for a few hours.

:: insert emoticon expressive of bewilderment, bafflement and general wtf-ness ::

Vague thought: Do Previews and Translations roll over, meaning that as long as someone asks for a preview less than some-amount-of-time after the previous request, it stays cached forever? If so, I'm missing a ### of a lot of traffic. I always assumed that people go straight to this page without bothering about a Preview.


I did answer part of my own previous question. Google Analytics lives at google, right?, not on your own server. So they know that you don't have anything else hiding behind those links. It's only other people's analytics programs that they have to be locked out of.

diberry




msg:4448030
 3:59 pm on May 1, 2012 (gmt 0)

So if we unblock them from js, what would they be able to do with/learn from our other analytics programs that run with js scripts? I mean, I just dumped G-Analyics to stop them from knowing so much about my sites, LOL.

lucy24




msg:4448118
 7:28 pm on May 1, 2012 (gmt 0)

what would they be able to do with/learn from our other analytics programs that run with js scripts?

In the case of piwik it isn't the js itself; that part is just a bit of code that in turn points to the meat of the analytics at piwik.php. But if I let them into the js they will follow its link to php (this is direct observation, not a guess) and frankly I don't want to speculate what information google would be able to extract from analytics. Maybe the googlebot just gets logged as a spurious visitor, but maybe they run wild and learn all kinds of things that a human browser wouldn't be able to learn.

Both the js and the php live in the /piwik/ directory, which is roboted-out. Generally the googlebot by that name follows robots.txt, but in this case for some reason they don't. Are they making some arcane exception for linked files as opposed to primary requests? Or are they simply snooping?

Besides: executing the .js may be on their nickel, but downloading it is on mine, at 20K a pop.

johnmoose




msg:4448575
 5:25 pm on May 2, 2012 (gmt 0)

This is why they want access:

[googlewebmastercentral.blogspot.com ]

So that coding can be become a ranking factor too.... or not?

This 103 message thread spans 4 pages: < < 103 ( 1 2 3 [4]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved