Welcome to WebmasterWorld Guest from 220.127.116.11
Lets start a list of the ways sites can link, refer, or point urls to your pages other than direct hrefs. Mainly, we are after ways that SE's such as Google may run into and use urls they find:
What else? Wow - blew through 20....
I will update as we go. Thanks to everyone who pitches in...
[edited by: Brett_Tabke at 3:57 pm (utc) on April 19, 2009]
Speaking of which, their contextual ads are also served in Googlegroups, which could also be a source of discovery.
Further along lines of behavioral advertising, they've got plenty of data from DoubleClick and Google Affiliate Network. Every new page/URL put up that an affiliate link is put on phones home.
[edited by: Marcia at 4:58 am (utc) on April 21, 2009]
I think if I were building an SE, and my main focus was therefore on finding and indexing information, I would be casting the net slightly wider and looking for words in pages which don't exist as far as my dictionary is concerned. That might indicate a company name, for example.
If my company name was "Jepstons" and that word was used on an English language page (and therefore I know it is not a word, but it could be a brand) then my natural inclination would be to ascertain if jepstons.top_level_domain exists and if it does I'd be tempted to go index it.