Forum Moderators: open
My case however is that I have lots of pages that google doesn't know about. Each page is about a product and the only way to find it is thru our search engine. Therefore I'm thinking of making a program that submits every URL to Google using the Add URL-page.
Does this violate any of the Google Rules? How many pages should I submit / minute?
Leif
Google's made it clear they don't want their servers hammered, even automated rank-checking software use is technically a TOS violation. *Any* site can block an IP and in this case 500K pages from the site isn't likely to cause that site to be cordially received.
I wouldn't do it - I'd just let Googlebot do her thing; there's no way you'll be able to force deep-crawling.
born2drv: Amazon has lots of pages [google.com] indexed
If those pages can't be found by crawling links they won't get PR and can't rank for anything anyway.
My question "do I have to worry?" is still not answered. Does anyone know?
People have gotten into trouble even for using rank-checking software, probably near that kind of load.
boris, it sounds to me like those pages don't exist until there's a search done - that they're dynamically generated. If that's the case and they can't be reached by a link at all it's wasted risk to auto-submit
Yes, they are dynamic - but they exist without a search enginge! I have links from other websites to maby 100 of pages today.
Creating a sitemap only for Google doesn't make any sence at all. People who finds this sitemap (maby by using google) will get confused. The sitemap will be enormous and will have to be splitted lots of times...
Do you want the pages to be indexed by Google?
Then create a sitemap.
Split it down into a directory style navigation.
Chances are by doing this for a 500k page site then you will increase your traffic too from just the mapped pages - lots of niche keywords on each page! ;)
Scott
Maybe someone can better advise you but from what I remember the # of pages you can get indexed is somehow proportional to the pagerank of your site.
I think this was the basis of Everyman's complaints that PR favored big business and they were the only ones allowed to have such high amounts of indexed pages.
If your site for example has a PR4 on the main page and little or no inbound links for other pages on your site there is no way a home page PR4 will be sufficient to give any link value to 500,000 pages. Most of them will be PR0 so it's not even worth Google's time to index them. But when you have a PR8-9 you can get them indexed because the extremely deep content will still be PR1-3 or whatever.
I would hope that it wouldn't have any affect on the site (as it could be a competitor), but your ISP connection might find itself blocked from Google (they don't like to be accessed by tools, and you're not going to submit them all by hand!).
> Why does Google have an Add Url page at all then?
I suspect that the only use for it is to stop Google support from being asked many times each day why there isn't one.
Google follows links. IMO, you should consider a robot and human friedly site architecture that opens up your content via well organised categories. That requires effort, of course. You may wish to consider an agency/affiliate deal with someone who's good at information architecture.
I suspect that the only use for it is to stop Google support from being asked many times each day why there isn't one.
I agree, ciml - I think it's a placebo. (Kind of like the "close door" button on the elevator - it gives the Type A individual something to do while the elevator decides when to close the door. ;))
Break them into themes or product sections. Make a 100-200 link site map per section. If there are more than 100-200 links per section just do a " << Previous Next >> "
If you have a site that big, you probably have a datafile which makes it easy to make these site maps/indexes with something like Webmerge. At 200 links per page that is only 2500 site maps. No big deal.
I have several sites 100K+ pages that are basically search results (all driven by a search engine). Google ate every page of all of them. All of the urls are dynamic
mysite.con/script.cgi?q=my+search+terms.
Just keep it around 100-200 links per page and you should be all set.
Also if you log search terms entered by your visitors, keep a link list of the top 50-100 searches on your site and change it every month or whatever.
um Yahoo has 4.5 million pages indexed. [google.com]