Welcome to WebmasterWorld Guest from 23.22.79.235

Forum Moderators: open

Message Too Old, No Replies

Deep Crawl Madness

What's he doing in there?!

   
12:04 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member




We've had five googlebots deepcrawling our site since last night (where they spent a happy couple of hours) and this morning (have been there for the last 3 hours).

This site is not big by any stretch (maybe 100 pages max).

What's he up to? Following links out, indexing another site and then back to us? Presumably if that's the case, a good sign?

TJ

12:49 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think google tries to spread out the bandwidth a little bit. Some people were complaining that the spiders can hit the sites pretty hard if they aren't throttled a bit.
12:55 pm on Apr 17, 2003 (gmt 0)

10+ Year Member



I'm getting a ton of deepbots crawling my dynamic content (go bot go!) .. Now, I'm very new to all this G-bot obsessivness .. so mayby my next observation is not unusual, but I am curious what others could tell me .. ya see, I'm, getting freshbot popping in at the same time.
1:00 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yeah, we have the same - freshbot is in there alongside 27 (at last count) different IP deepcrawl bots.

Fortunately, we have big bandwidth, otherwise site would probably be bought down by all this activity. It's going deep too - down through all layers and links so far. I think it will index every single page at this rate. They've been there for *hours* now.

I assume this is a good thing, but I'm fairly new to this.

Site has only been online for 3 weeks, but thanks to freshbot we got high in the serps pretty quickly (No. 1 for one main key search term two days after launch) and I think we've been linked into by quite a lot of decent sites (we have really good content).

But grey pagerank at the moment - hoping next dance we show at least a 4.

Tj

1:06 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member vitaplease is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I assume this is a good thing, but I'm fairly new to this.

A very good thing, just think others ask money to go out spidering!
(without offering you the vast amount of visitors after)

Anyone seen the spiders in the movie "Minority Report"?

I would have thought Google sponsored that movie :)

1:33 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Looks like my first succesful site then.

Brett was right it seems - we concentrated purely on content and let the web do the rest of the work for us.

That's what I'll do in future - just build good content sites, with good page titles and forget about the rest.

One link is all you need to get started, the rest us just patience (although having said that, this has all happened very fast for us!).

TJ

3:33 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well, he's still in there along with 26 of his buddies (all different google IP's).

Is this normal? About 7 hours crawling on a site of <300 pages (rough calculation).

He's maxed out my bandwidth (even with all image directories etc blocked). I don't mind, got to be good for us in teh long run, but my users are going to wonder why it takes a page 10 seconds to load at the moment!

Anyway of getting google to back off the gas a little?

TJ

3:37 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is Googlebot really maxing out your pipe? Sounds a bit odd.

I've never seen deepcrawl go hit/hit/hit/hit; it's always hit - go away and have a cup of tea - hit....

3:51 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Seems to have calmed down now - I was trying to look at throttle stats and his progress at the same time.

I suspect the maxing out may actually have been several people hitting an image intensive page all at the same time - so I think you're right, not google.

I'm quite fascinated by all this - I didn't realise the depth to which our site was going to be crawled (about 16 more pages and he has the entirety of the site). I think I got my internal link strategy right!

Thanks for all the help,

TJ

9:50 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



:::Some people were complaining that the spiders can hit the sites pretty hard if they aren't throttled a bit.

Yah, that was me! It almost crashed my server during the last deepcrawl! It was bombing me, and took the server CPU load average up to over 20! I did way too much SEO on my vBulletin boards!

10:38 pm on Apr 17, 2003 (gmt 0)

10+ Year Member



I've got the opposite problem. about 36 hours ago, Deepbot and FreshBot came in and each picked off about 5 pages (just the top level ones like webpage.asp, cart.asp, login.asp) and I haven't seen them since.

Is there good reason for this creeping sense of dread I feel, or is it just first-timer's jitters?

Darryl.

3:33 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well, it's still going on......... 60 googlebots currently on site hitting at a rate of about 1 a second.

Can anyone point me to some resources to stop the deepcrawler being so aggresive on bandwidth? This is ridiculous. We're currently running at about 76% of capacity and 90% of that is damn googlebot!

I never thought I'd see the day I was annoyed by a robot crawling one of our sites!

TJ

3:39 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



Yah, that was me! It almost crashed my server during the last deepcrawl! It was bombing me, and took the server CPU load average up to over 20! I did way too much SEO on my vBulletin boards!

The ironies that webmasters have to go through.

3:58 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



We have an optimised forum, but there's hardly any postings on it.

The web master world site must take a battering in that respect!

TJ

6:00 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member




Still got 60 bots and they've been in there for hours now. The damn little guy has downloaded several hundred megabytes (and I didn't think this site was that big?) and created a logfile that's 103mb in size (I'm not kidding).

Is this for real?

TJ

7:17 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Whats going on here? Someone must know? Googleguy?

The bot is stuck in recursion methinks, and all his mates are joining him.....

TJ

7:29 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Perhaps you upset someone at Google and they are doing a denial of service attack on you? ;)

7:33 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



>Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Maybe the bots have fallen in love with your website and are now fighting for ownership. Sorry mods.

7:48 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



OK, discovered the problem......... actually we discovered a little flaw in googlebot by mistake.

Sorry to have delayed anyone elses index crawling by preoccupying the little guys!

I telephoned google and they were really helpful (10/10 for that google).

They are working on the problem now - have already phoned me back once. They are naturally very interested in having a copy of my logfiles - now >200mb in size.

Hmmm.... perhaps they're worth money? lol

I suspect right about now we rank about a PR10.

LOL

TJ

7:50 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



So that's where they've been... seriously little deepbot activity so far this round here.
9:27 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



Funny story TrillianJedi! :)
9:41 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Not so funny when google eats it's way through half a gig of paid for bandwidth in a few hours.... that's my $$$

lol

We've found the googlebot bug and google are being quite helpful though.

Mind you, we're helping them sort their damn robot at the moment!

TJ

9:45 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



::::They are naturally very interested in having a copy of my logfiles - now >200mb in size.

Do you know how to create a Googlebot log in your public_html directory? I don't think you would want to E-Mail it to them! Here's what you do. Get in telnet and enter these commands, after you change the paths to the correct directory, and change * to the mark that's almost like the '!', that's above the return button. The send them the Google log URLs to them and let them get them them selves. I'll bet there internet connections are much faster!

cat /logs/web.log * grep 216.239.46 > /puiblic_html/deep.log
cat /logs/web.log * grep 64.68.82 > /public_html/fresh.log

:::Is there good reason for this creeping sense of dread I feel, or is it just first-timer's jitters?

That's normal. There is seven days left in the deepcrawl. My big sites are being bombed, while my smaller sites haven't got a lot of hits yet. Last month it was in the second half of the deepcrawl that it got my smaller sites. Small being around 500-700 files vs around 10,000-20,000 files.

[edited by: Jesse_Smith at 9:49 pm (utc) on April 18, 2003]

9:47 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



What was the nature of the bug in general terms?
9:47 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's now nearly 300mb! Couldn't email anyway - I'll set up an ftp account for them if they want it.

Or print it out and mail it to them.

lmao

TJ

9:52 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



@ rfgdxm1

A certain forum script in a certain configuration leads google to believe that we have a site of infinite depth and content.

Would be great if it actually stopped at some point. PR11 maybe?!

The old "recursion: see recursion" style definition....

It's chomping through the full 3 mbits connection at the moment. I only just managed to squeeze in to get the backup out.

I don't really know enough about the backend stuff to fully understand this. My partner in crime is dealing. I get to make the phone calls. I got the bum deal I think!

TJ

9:57 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



I'm seeing the same behavior and have a question about it, is googlebot ignoring robots.txt?

She's been running around my cgi-bin for the longest time though I have it excluded.

10:00 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It did not ignore our robots.txt, but due to the problem with the way the bot seems to keep a queue of URL's to check, we can't get rid of it now, even if we blocked google in robots.

I don't want to pull the server down because we currently have users on the site (no doubt experiencing a slight slow down!).

I ended up telephoning google when the bandwidth started to get into my "now you pay" section...

I would go to google.com and "contacts" and get some numbers handy if you think this is a problem for you.

TJ

10:39 pm on Apr 18, 2003 (gmt 0)

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>I ended up telephoning google when the bandwidth started to get into my "now you pay" section...

Send the bill to Google.

10:45 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



Question:
The deepbot has been hanging around my page for a couple of days now but seems to be crawling extremely slow. My site has a low PR (it is very new) but has about 5,000 or so pages. This is my second deepcrawl and I was hoping google would gobble all of it...but judging by the freqeuncy of it hitting me it wont. Is this a good way to judge how much the bot likes me...should it be going faster?

Thanks,
Kevin

This 37 message thread spans 2 pages: 37