Forum Moderators: open

Message Too Old, No Replies

Crawling Speed on Virtual Hosted Sites?

How can I get max pages into google?

         

peterdaly

1:25 pm on Feb 19, 2003 (gmt 0)

10+ Year Member



I currently have 5 very large sites (50k+ pages) all virtual hosted on one dedicated server. My traffic to the sites is directly related to how many pages Google gets each crawl. I am on a quest for maximum traffic.

Am I decreasing the number of pages the deepcrawl will get by having them all on one virtual hosted IP, as opposed to each on their own IP/box? Bandwidth is not a major issue.

I plan on having 8-10 similarly sized sites up by googles' March crawl, does that change things?

Thanks,

-Pete

nativenewyorker

2:54 pm on Feb 19, 2003 (gmt 0)

10+ Year Member



I seems like you are building a lot of similar pages if you plan on adding 400k-500k pages within 2 weeks. Beware of Google penalties.

Ted

peterdaly

3:10 pm on Feb 19, 2003 (gmt 0)

10+ Year Member



I am linking dynamic product pages from an already existing database of hundreds of thousands of products.

I am targeting sites based on product types. There should be minimal content overlap other than the template layout between the sites.

I am well aware of Google's history in this regard.

-Pete

ciml

3:40 pm on Feb 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't imagine that the IPs would make a difference. To be crawled deeply you want plenty of PageRank, clean looking URLs (no?, &, =, etc.), less than 100 links per page and a quick server response time. If you search for (site:webmasterworld.com webmasterworld.com) (no brackets) you'll see that Google can crawl quite deeply.

Grumpus

3:52 pm on Feb 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yup - get a root page PR of at least 5 if you want 50K pages in. Actually, I think PR5 tends to get somewhere in the 42K - 48K pages range depending upon the month. You also need to have a nice fluid LINKFLOW. Page 1 needs to link to bunches of pages and the level 2 pages need to link to bunches more and the level 3 pages bunches more again. It'll never work with a page 1 links to 2 which links to 3 which links to 4 and maybe 5.

G.

vitaplease

3:57 pm on Feb 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As the above say, Pagerank, static links, diluted carefully.

Check out some crawling pdf-type research papers, and they tend to say the same.

Spiders would like to start at the highest Pagerank.

Amongst others:

[stanford.edu...]

Handouts 13 to 17. (handout 18 is another topic) ;)