Google bot crippled my site

Forum Moderators: open

Message Too Old, No Replies

Google bot crippled my site

bot opened a zillion sessions of the shopping cart

jpavery

1:33 pm on Sep 17, 2003 (gmt 0)

Hi Folks,
Yesterday googlebot brought our site to a slow crawl. Google bot apparently opened millions of sessions of our shopping cart.

Our site is dynamic DB driven... we have session IDs in the URL and IPs in the URL... so I suspect that was part of the problem. But in the past (two years now) Google has successfuly crawled our site. So I'm a little perplexed on what to do.

Other than redesigning our site does anyone have suggestions that allows Google to crawl, but not bring us to our knees?

Thanks,
JP

bcolflesh

1:48 pm on Sep 17, 2003 (gmt 0)

Disallow the cart pages in your robots.txt file.

Slade

1:58 pm on Sep 17, 2003 (gmt 0)

If the cart pages have data you want to be spidered, that's not a very helpful suggestion.

Look at testing for user-agent or IP block and adjusting the links that are generated to be stable for Googlebot. (Always generate the same url to get from this page to the next.)

percentages

2:15 pm on Sep 17, 2003 (gmt 0)

>Other than redesigning our site does anyone have suggestions that allows Google to crawl, but not bring us to our knees?

Make sure your TOS says the site is only to be accessed by true users and then hit 'em with a law suit for not reading and obeying it ;)

I'm all for changing robots.txt so that it has to explicitely say a site may be crawled by spiders....that should get the SE's index sizes back to something manageable...LOL ;)

Heck, the law says if some uninvited Joe decided to wander through my home in the middle of the day I am perfectly entitled to blast him with a 12 guage....same should apply to spiders online :)

Seriously though, Google needs to put a little more thought into what its bot is doing with regard to dynamic URLs. This is not your problem to fix, it is theirs.

trillianjedi

2:53 pm on Sep 17, 2003 (gmt 0)

This is not your problem to fix, it is theirs.

Google specifically ask webmasters not to serve googlebot with an SID.

Fairly easy to do this by user-agent.

Seriously though, Google needs to put a little more thought into what its bot is doing with regard to dynamic URLs.

I agree with that though - 99% of SID's actually have "SID=" at the end of the URL - no idea why they can't just ignore everything from "SID=" onwards.

kaled

4:25 pm on Sep 17, 2003 (gmt 0)

Make sure your TOS says the site is only to be accessed by true users and then hit 'em with a law suit for not reading and obeying it ;)

I don't see that any such statement is necessary. It behoves all search engines to manage their robots in a manner that is not detrimental to the sites they are visiting. If I write a program that brings a website to its knees it's likely to be called a D.O.S. attack and if I'm caught I should rightly end up in court. I do not believe that a defence based on the site's T.O.S. would be effective.

Kaled.

rfgdxm1

4:51 pm on Sep 17, 2003 (gmt 0)

>I don't see that any such statement is necessary. It behoves all search engines to manage their robots in a manner that is not detrimental to the sites they are visiting. If I write a program that brings a website to its knees it's likely to be called a D.O.S. attack and if I'm caught I should rightly end up in court. I do not believe that a defence based on the site's T.O.S. would be effective.

IANAL, but assuming it was unintentional I'd day not likely in the US. I'd never rule against Google if I were the juror. Anone who is savvy enough to know how to serve up dynamic URLs should know about robots.txt.

Net_Wizard

6:54 pm on Sep 17, 2003 (gmt 0)

You have two options if you still want Google to index those URLs

1. Email Google your situation and ask them if they can pace the crawling of your site.

2. Or...Get a more powerful server and a reliable host. Think of it this way, if those spider access are unique human access then you will still have the same problem.