Forum Moderators: open
I can think of several possible improvements. One that several people have suggested is using e.g. DMOZ categories to select pages. This would require manual selection however, and ongoing maintainance as DMOZ updates and extends.
Another idea would be to allow webmasters to add META tags requesting pages be included in a particular country index. There would obviously be room for abuse here, most obviously with United States sites including a non-US tag just for some extra traffic, but as more regional indices (ie a "United States" or "North America" search!) are added that should decline.
Another option would be to start with a core set of pages selected on domain/IP address, but in addition to those to index pages which have a majority of incoming links from that core. So if a page in blah.com has more than 25% of its links from pages in (say) the core www.google.ie index, it would also be included in Google's "pages from Ireland". This would add some computation to index building, but not I think that much.
Anyone got any other good ideas?
So if a page in blah.com has more than 25% of its links from pages in (say) the core www.google.ie index, it would also be included in Google's "pages from Ireland".
I started off thinking this would be a bad thing as my page has (as far as I know) only one incoming link from another New Zealander. But then I realised that this probably means my page isn't important to other kiwis and therefore it becomes a good thing. If I had really cool local information that kiwis all want then I would probably get lots of NZ links and deserve a place in the local index.
The only other thing I can think of is use of whois information for domain names to pinpoint the location of their owner. But I'm sure this would bring all kinds of privacy issues with it and doesn't take into account shared domains and the like.
This is part way to a geographical 'personalised PageRank'.
Each update needs several iterations to converge on a steady PR graph of the Web. At each iteration of PageRank, the rank source could be the 'core' set of matching IP addresses or TLDs. You could then see the 'importance' of a page from a UK, French or German perspective (and Australia, Danny).
That will be very powerful (if they ever get round to it).