Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Most aggressive crawl I have ever seen.

Gbot pulled 4700 pagesso far today; Request 3 pages a second



9:48 pm on Feb 22, 2005 (gmt 0)

I recently changed my site to reduce the page size drastically and clarified the links structure a bit.

Today gbot has pulled 4700 pages (so far) requesting 3 pages a second.

I verified its a real bot. Is this normal? My server can easily handle the load but just seems a little frightening and exciting at the same time.


2:04 pm on Feb 24, 2005 (gmt 0)

10+ Year Member

Consider Block Rank

Thanks TJ will have to have a read up Block Rank, not subscribed to WebmasterWorld so can't read the thread.

Thinking about the term though I can see where it's leading.

And yes the chunk of stuff it took does need page rank calculated as it's all been renamed etc.

Block rank bed time reading : [dbpubs.stanford.edu:8090...]


2:32 pm on Feb 24, 2005 (gmt 0)

Just checked and I am seeing that I have a ton of pages make it out of the supplemental index.

That has nothing to do with a google change though.

I solved ( as I said before ) my page size problem and a duplicate page problem.


3:46 pm on Feb 24, 2005 (gmt 0)

"my page size problem"

what do you mean?


3:52 pm on Feb 24, 2005 (gmt 0)

10+ Year Member

How did you solve the page duplication problem and get ur pages out of supplement results?


4:00 pm on Feb 24, 2005 (gmt 0)

10+ Year Member

walkman: Maybe he used to have pages bigger than 100K.


4:03 pm on Feb 24, 2005 (gmt 0)

Many of my pages were well over 100k. Its a forum/cms based site that was using a heavy template. I edited the template to reduce the amount of redundent code and almost all of the pages displayed now are under 40k. Which, by no coincidence, is what most of the webmaster world page sizes are and its well indexed by google.

My page duplication problem was a result of mod_rewrite, I had many pages that were available via several different url's. To fix the issue I added the regular dynamic looking page urls to my robots.txt file. I also went through and double checked my forum/cms software to make sure that it was outputting the new static looking url.


4:04 pm on Feb 24, 2005 (gmt 0)

10+ Year Member

Google ain't crawling like November :)

In November I peaked at 170 pages per second from Googlebot (yes, dynamic database-generated pages, thank goodness for C :) and saw 100+ pages per second for three to four minutes at a time. This crawl I see 20 to 30 pages per second max, without anywhere near the sustain time of November's crawl.

Yahoo, on the other hand, is crawling more aggressively than I've ever seen, but is still using (boneheaded) Inktomi code. One beautiful thing that Inktomi's (boneheaded) code will do is ask for a non-directory when given a directory link. That is to say, if Inktomi finds a link like "/a/b/c/" it'll try to crawl "/a/b/c" and will generate an extra hit as the server redirects to the proper URI with a 301 redirect. Beautiful.


5:36 pm on Mar 17, 2005 (gmt 0)

Just bringing this thread back for a quick question.

Has anyone see the pages from this crawl make the index - I have some pages added - but definetly not the amount that were crawled.

Just wondering what others experiences are on this?


6:18 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

I'm seeing pages from two days ago in Google's Index -- site is a forum like (but completely unlike, if you see what I mean) this one.


6:21 pm on Mar 17, 2005 (gmt 0)

Yes - I am getting pages added - but this crawl went very deep on some sites that have not been crawled well recently - these pages have not appeared in the index - but more recently crawled pages have.
This 40 message thread spans 2 pages: 40

Featured Threads

Hot Threads This Week

Hot Threads This Month