homepage Welcome to WebmasterWorld Guest from 54.227.20.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 35 message thread spans 2 pages: < < 35 ( 1 [2]     
Googlebot getting caught in robots.txt spider trap
starchild

5+ Year Member



 
Msg#: 4346138 posted 11:14 am on Aug 1, 2011 (gmt 0)

Hi,

I saw today that Googlebot got caught in a spider trap that it shouldn't have as that dir is blocked via robots.txt

I know of at least one other person recently who this has also happened to.

Why is GB ignoring robots?

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4346138 posted 1:41 am on Sep 6, 2011 (gmt 0)

Serving the right error codes for both planned and unplanned outages is something that few sites get completely right.

OK, now I'm trying to wrap my brain around the idea of having control over what gets served up during an unplanned, uhm, anything. Is there a definitive thread that explains it? "Error code" doesn't seem to be a fruitful search string ;) (16,600 hits-- constrained to this site-- goes beyond "fruitful" into "rotting on the ground". Squelch.)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4346138 posted 7:18 am on Sep 6, 2011 (gmt 0)

Serving a "site temporarily offline for updating" message with "200 OK" with or without 301 redirecting all site URLs to an error page, is a big bad idea.

DNS failure, server meltdown, etc will just timeout and return no website. Serving "can't connect to database" with "200 OK" is asking for trouble; serving 503 is much better. No idea if there is a definitive list.

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4346138 posted 5:37 pm on Sep 6, 2011 (gmt 0)

@draftzero, that seems to imply that the page is not crawled for search purposes, which is not what the conversation above assumes. If that is really what they are doing, there is no problem.

@g1smd, part of the problem is that some CMSs get it wrong. I think Wordpress used to but it was fixed.

@lucy, on another thread you said your site was entirely static HTML, so you have nothing to worry about: I have never come across a web server getting it wrong, its badly written CMS's and scripts.

levo

5+ Year Member



 
Msg#: 4346138 posted 11:12 pm on Sep 7, 2011 (gmt 0)

My code just caught 5 Google IPs. Request headers are;

User-agent: urlresolver
Host: www.domain.com
Accept-Encoding: gzip


and IPs are 74.125.42.80/2/3/4/5

Any idea what "urlresolver" is for? Something like facebook url linter?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4346138 posted 3:53 am on Sep 8, 2011 (gmt 0)

There's a thread about it.

[webmasterworld.com...]

This 35 message thread spans 2 pages: < < 35 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved