homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 37 message thread spans 2 pages: 37 ( [1] 2 > >     
Deep Crawl Madness
What's he doing in there?!
trillianjedi




msg:186960
 12:04 pm on Apr 17, 2003 (gmt 0)


We've had five googlebots deepcrawling our site since last night (where they spent a happy couple of hours) and this morning (have been there for the last 3 hours).

This site is not big by any stretch (maybe 100 pages max).

What's he up to? Following links out, indexing another site and then back to us? Presumably if that's the case, a good sign?

TJ

 

MrSpeed




msg:186961
 12:49 pm on Apr 17, 2003 (gmt 0)

I think google tries to spread out the bandwidth a little bit. Some people were complaining that the spiders can hit the sites pretty hard if they aren't throttled a bit.

dmedia




msg:186962
 12:55 pm on Apr 17, 2003 (gmt 0)

I'm getting a ton of deepbots crawling my dynamic content (go bot go!) .. Now, I'm very new to all this G-bot obsessivness .. so mayby my next observation is not unusual, but I am curious what others could tell me .. ya see, I'm, getting freshbot popping in at the same time.

trillianjedi




msg:186963
 1:00 pm on Apr 17, 2003 (gmt 0)

Yeah, we have the same - freshbot is in there alongside 27 (at last count) different IP deepcrawl bots.

Fortunately, we have big bandwidth, otherwise site would probably be bought down by all this activity. It's going deep too - down through all layers and links so far. I think it will index every single page at this rate. They've been there for *hours* now.

I assume this is a good thing, but I'm fairly new to this.

Site has only been online for 3 weeks, but thanks to freshbot we got high in the serps pretty quickly (No. 1 for one main key search term two days after launch) and I think we've been linked into by quite a lot of decent sites (we have really good content).

But grey pagerank at the moment - hoping next dance we show at least a 4.

Tj

vitaplease




msg:186964
 1:06 pm on Apr 17, 2003 (gmt 0)

I assume this is a good thing, but I'm fairly new to this.

A very good thing, just think others ask money to go out spidering!
(without offering you the vast amount of visitors after)

Anyone seen the spiders in the movie "Minority Report"?

I would have thought Google sponsored that movie :)

trillianjedi




msg:186965
 1:33 pm on Apr 17, 2003 (gmt 0)

Looks like my first succesful site then.

Brett was right it seems - we concentrated purely on content and let the web do the rest of the work for us.

That's what I'll do in future - just build good content sites, with good page titles and forget about the rest.

One link is all you need to get started, the rest us just patience (although having said that, this has all happened very fast for us!).

TJ

trillianjedi




msg:186966
 3:33 pm on Apr 17, 2003 (gmt 0)

Well, he's still in there along with 26 of his buddies (all different google IP's).

Is this normal? About 7 hours crawling on a site of <300 pages (rough calculation).

He's maxed out my bandwidth (even with all image directories etc blocked). I don't mind, got to be good for us in teh long run, but my users are going to wonder why it takes a page 10 seconds to load at the moment!

Anyway of getting google to back off the gas a little?

TJ

dmorison




msg:186967
 3:37 pm on Apr 17, 2003 (gmt 0)

Is Googlebot really maxing out your pipe? Sounds a bit odd.

I've never seen deepcrawl go hit/hit/hit/hit; it's always hit - go away and have a cup of tea - hit....

trillianjedi




msg:186968
 3:51 pm on Apr 17, 2003 (gmt 0)

Seems to have calmed down now - I was trying to look at throttle stats and his progress at the same time.

I suspect the maxing out may actually have been several people hitting an image intensive page all at the same time - so I think you're right, not google.

I'm quite fascinated by all this - I didn't realise the depth to which our site was going to be crawled (about 16 more pages and he has the entirety of the site). I think I got my internal link strategy right!

Thanks for all the help,

TJ

Jesse_Smith




msg:186969
 9:50 pm on Apr 17, 2003 (gmt 0)

:::Some people were complaining that the spiders can hit the sites pretty hard if they aren't throttled a bit.

Yah, that was me! It almost crashed my server during the last deepcrawl! It was bombing me, and took the server CPU load average up to over 20! I did way too much SEO on my vBulletin boards!

DarrylParker




msg:186970
 10:38 pm on Apr 17, 2003 (gmt 0)

I've got the opposite problem. about 36 hours ago, Deepbot and FreshBot came in and each picked off about 5 pages (just the top level ones like webpage.asp, cart.asp, login.asp) and I haven't seen them since.

Is there good reason for this creeping sense of dread I feel, or is it just first-timer's jitters?

Darryl.

trillianjedi




msg:186971
 3:33 pm on Apr 18, 2003 (gmt 0)

Well, it's still going on......... 60 googlebots currently on site hitting at a rate of about 1 a second.

Can anyone point me to some resources to stop the deepcrawler being so aggresive on bandwidth? This is ridiculous. We're currently running at about 76% of capacity and 90% of that is damn googlebot!

I never thought I'd see the day I was annoyed by a robot crawling one of our sites!

TJ

jrobbio




msg:186972
 3:39 pm on Apr 18, 2003 (gmt 0)

Yah, that was me! It almost crashed my server during the last deepcrawl! It was bombing me, and took the server CPU load average up to over 20! I did way too much SEO on my vBulletin boards!

The ironies that webmasters have to go through.

trillianjedi




msg:186973
 3:58 pm on Apr 18, 2003 (gmt 0)

We have an optimised forum, but there's hardly any postings on it.

The web master world site must take a battering in that respect!

TJ

trillianjedi




msg:186974
 6:00 pm on Apr 18, 2003 (gmt 0)


Still got 60 bots and they've been in there for hours now. The damn little guy has downloaded several hundred megabytes (and I didn't think this site was that big?) and created a logfile that's 103mb in size (I'm not kidding).

Is this for real?

TJ

trillianjedi




msg:186975
 7:17 pm on Apr 18, 2003 (gmt 0)

Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Whats going on here? Someone must know? Googleguy?

The bot is stuck in recursion methinks, and all his mates are joining him.....

TJ

rfgdxm1




msg:186976
 7:29 pm on Apr 18, 2003 (gmt 0)

>Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Perhaps you upset someone at Google and they are doing a denial of service attack on you? ;)

jrobbio




msg:186977
 7:33 pm on Apr 18, 2003 (gmt 0)

>Hmmm.... ok, google has now downloaded 430mb of my site..... which consists of, errrr, maybe 10mb?

Maybe the bots have fallen in love with your website and are now fighting for ownership. Sorry mods.

trillianjedi




msg:186978
 7:48 pm on Apr 18, 2003 (gmt 0)

OK, discovered the problem......... actually we discovered a little flaw in googlebot by mistake.

Sorry to have delayed anyone elses index crawling by preoccupying the little guys!

I telephoned google and they were really helpful (10/10 for that google).

They are working on the problem now - have already phoned me back once. They are naturally very interested in having a copy of my logfiles - now >200mb in size.

Hmmm.... perhaps they're worth money? lol

I suspect right about now we rank about a PR10.

LOL

TJ

BGumble




msg:186979
 7:50 pm on Apr 18, 2003 (gmt 0)

So that's where they've been... seriously little deepbot activity so far this round here.

cheater copperpot




msg:186980
 9:27 pm on Apr 18, 2003 (gmt 0)

Funny story TrillianJedi! :)

trillianjedi




msg:186981
 9:41 pm on Apr 18, 2003 (gmt 0)

Not so funny when google eats it's way through half a gig of paid for bandwidth in a few hours.... that's my $$$

lol

We've found the googlebot bug and google are being quite helpful though.

Mind you, we're helping them sort their damn robot at the moment!

TJ

Jesse_Smith




msg:186982
 9:45 pm on Apr 18, 2003 (gmt 0)

::::They are naturally very interested in having a copy of my logfiles - now >200mb in size.

Do you know how to create a Googlebot log in your public_html directory? I don't think you would want to E-Mail it to them! Here's what you do. Get in telnet and enter these commands, after you change the paths to the correct directory, and change * to the mark that's almost like the '!', that's above the return button. The send them the Google log URLs to them and let them get them them selves. I'll bet there internet connections are much faster!

cat /logs/web.log * grep 216.239.46 > /puiblic_html/deep.log
cat /logs/web.log * grep 64.68.82 > /public_html/fresh.log

:::Is there good reason for this creeping sense of dread I feel, or is it just first-timer's jitters?

That's normal. There is seven days left in the deepcrawl. My big sites are being bombed, while my smaller sites haven't got a lot of hits yet. Last month it was in the second half of the deepcrawl that it got my smaller sites. Small being around 500-700 files vs around 10,000-20,000 files.

[edited by: Jesse_Smith at 9:49 pm (utc) on April 18, 2003]

rfgdxm1




msg:186983
 9:47 pm on Apr 18, 2003 (gmt 0)

What was the nature of the bug in general terms?

trillianjedi




msg:186984
 9:47 pm on Apr 18, 2003 (gmt 0)

It's now nearly 300mb! Couldn't email anyway - I'll set up an ftp account for them if they want it.

Or print it out and mail it to them.

lmao

TJ

trillianjedi




msg:186985
 9:52 pm on Apr 18, 2003 (gmt 0)

@ rfgdxm1

A certain forum script in a certain configuration leads google to believe that we have a site of infinite depth and content.

Would be great if it actually stopped at some point. PR11 maybe?!

The old "recursion: see recursion" style definition....

It's chomping through the full 3 mbits connection at the moment. I only just managed to squeeze in to get the backup out.

I don't really know enough about the backend stuff to fully understand this. My partner in crime is dealing. I get to make the phone calls. I got the bum deal I think!

TJ

fmonk




msg:186986
 9:57 pm on Apr 18, 2003 (gmt 0)

I'm seeing the same behavior and have a question about it, is googlebot ignoring robots.txt?

She's been running around my cgi-bin for the longest time though I have it excluded.

trillianjedi




msg:186987
 10:00 pm on Apr 18, 2003 (gmt 0)

It did not ignore our robots.txt, but due to the problem with the way the bot seems to keep a queue of URL's to check, we can't get rid of it now, even if we blocked google in robots.

I don't want to pull the server down because we currently have users on the site (no doubt experiencing a slight slow down!).

I ended up telephoning google when the bandwidth started to get into my "now you pay" section...

I would go to google.com and "contacts" and get some numbers handy if you think this is a problem for you.

TJ

rfgdxm1




msg:186988
 10:39 pm on Apr 18, 2003 (gmt 0)

>I ended up telephoning google when the bandwidth started to get into my "now you pay" section...

Send the bill to Google.

uci_bink




msg:186989
 10:45 pm on Apr 18, 2003 (gmt 0)

Question:
The deepbot has been hanging around my page for a couple of days now but seems to be crawling extremely slow. My site has a low PR (it is very new) but has about 5,000 or so pages. This is my second deepcrawl and I was hoping google would gobble all of it...but judging by the freqeuncy of it hitting me it wont. Is this a good way to judge how much the bot likes me...should it be going faster?

Thanks,
Kevin

This 37 message thread spans 2 pages: 37 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved