Let me tell you a little story...
One of my sites has hundreds of thousands of pages. We sell everything. We represent lots of manufacturers.
Now, when we add a new product, we assign it an ID number through our database, and this ID number is used at our site to access that product's web page, like www.mysite/109765, where 109765 is the product ID#.
So... Every time we add a new product from a supplier, we would just add a new ID#. When a supplier discontinued a product, we'd mark the produt as discontinued, but let the ID# live. Over time, the ID numbers just kept growing and growing. It occured to me that ID# 000001 was the oldest URL on our site, and possibly commanded the most respect with google. Why do we keep adding more IDs? Why don't we backfill and reuse old IDs of products no longer available? No reason to keep access to those old products - redo them with new products!
So, we changed our database to work in this fashion, and that is how our database/web site works now.
OK, so that's the setup, and here comes the scary part. The last time we added an ID#, let's say the ID# was 517999. So, ID# 518000 does not and never did exist. Not in our database, and certainly not at our site.
The hair on the back of my neck is starting to rise now. ; )
Curiously, a few months ago, googlebot started asking for ID #518000. In fact it asked for about 50 IDs, in the range of 518000 - 518050. Weird. So, I watched this go on for a few weeks, wondering if I should 301 them, why is googlebot doing this, etc. etc. It was very curious.
Eventually it occured to me, well, if googlebot is desperate for that damn ID, why don't I feed it something? Alright. So the next time we had some new products to add, instead of backfilling, we added them to that new batch range. We had about 80 products to add, so we added them in to the batch 518000-518080. OK.
googlebot crawled those IDs within hours, and they were indexed in a couple days. Awesome!
Guess what happened next? Do you have goosebumps yet?
Yup! Within about a week, googlebot started asking for 518080-518130. About 50 more of the next IDs in the expected range. It's been doing this for a while now, pretty much every day. These URLs don't resolve, so they show up in my WMT ("Network unreachable"), and in my proprietary site error reports. I'm not sure how to proceed because if I resolve them one way or another, I suspect they will, I don't know, move on to the next batch? It's insane, but, from a computer's perspective, quite logical.
You can draw your own conclusions.
[edited by: tedster at 3:33 pm (utc) on Apr 17, 2010]
[edit reason] moved from another location [/edit]