homepage Welcome to WebmasterWorld Guest from 54.204.142.143
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 40 message thread spans 2 pages: < < 40 ( 1 [2]     
Most aggressive crawl I have ever seen.
Gbot pulled 4700 pagesso far today; Request 3 pages a second
vabtz




msg:717275
 9:48 pm on Feb 22, 2005 (gmt 0)

I recently changed my site to reduce the page size drastically and clarified the links structure a bit.

Today gbot has pulled 4700 pages (so far) requesting 3 pages a second.

I verified its a real bot. Is this normal? My server can easily handle the load but just seems a little frightening and exciting at the same time.

 

grail




msg:717305
 2:04 pm on Feb 24, 2005 (gmt 0)

Consider Block Rank

Thanks TJ will have to have a read up Block Rank, not subscribed to WebmasterWorld so can't read the thread.

Thinking about the term though I can see where it's leading.

And yes the chunk of stuff it took does need page rank calculated as it's all been renamed etc.

Block rank bed time reading : [dbpubs.stanford.edu:8090...]

vabtz




msg:717306
 2:32 pm on Feb 24, 2005 (gmt 0)

Just checked and I am seeing that I have a ton of pages make it out of the supplemental index.

That has nothing to do with a google change though.

I solved ( as I said before ) my page size problem and a duplicate page problem.

walkman




msg:717307
 3:46 pm on Feb 24, 2005 (gmt 0)

"my page size problem"

what do you mean?

illusionist




msg:717308
 3:52 pm on Feb 24, 2005 (gmt 0)

How did you solve the page duplication problem and get ur pages out of supplement results?

taps




msg:717309
 4:00 pm on Feb 24, 2005 (gmt 0)

walkman: Maybe he used to have pages bigger than 100K.

vabtz




msg:717310
 4:03 pm on Feb 24, 2005 (gmt 0)

Many of my pages were well over 100k. Its a forum/cms based site that was using a heavy template. I edited the template to reduce the amount of redundent code and almost all of the pages displayed now are under 40k. Which, by no coincidence, is what most of the webmaster world page sizes are and its well indexed by google.

My page duplication problem was a result of mod_rewrite, I had many pages that were available via several different url's. To fix the issue I added the regular dynamic looking page urls to my robots.txt file. I also went through and double checked my forum/cms software to make sure that it was outputting the new static looking url.

Critter




msg:717311
 4:04 pm on Feb 24, 2005 (gmt 0)

Google ain't crawling like November :)

In November I peaked at 170 pages per second from Googlebot (yes, dynamic database-generated pages, thank goodness for C :) and saw 100+ pages per second for three to four minutes at a time. This crawl I see 20 to 30 pages per second max, without anywhere near the sustain time of November's crawl.

Yahoo, on the other hand, is crawling more aggressively than I've ever seen, but is still using (boneheaded) Inktomi code. One beautiful thing that Inktomi's (boneheaded) code will do is ask for a non-directory when given a directory link. That is to say, if Inktomi finds a link like "/a/b/c/" it'll try to crawl "/a/b/c" and will generate an extra hit as the server redirects to the proper URI with a 301 redirect. Beautiful.

Dayo_UK




msg:717312
 5:36 pm on Mar 17, 2005 (gmt 0)

Just bringing this thread back for a quick question.

Has anyone see the pages from this crawl make the index - I have some pages added - but definetly not the amount that were crawled.

Just wondering what others experiences are on this?

victor




msg:717313
 6:18 pm on Mar 17, 2005 (gmt 0)

I'm seeing pages from two days ago in Google's Index -- site is a forum like (but completely unlike, if you see what I mean) this one.

Dayo_UK




msg:717314
 6:21 pm on Mar 17, 2005 (gmt 0)

Yes - I am getting pages added - but this crawl went very deep on some sites that have not been crawled well recently - these pages have not appeared in the index - but more recently crawled pages have.

This 40 message thread spans 2 pages: < < 40 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved