Forum Moderators: phranque
Why can't this preloading concept extend to the whole site? And then if it extends to the whole site, why can't it extend to every site? Ie, why can't I just leave my PC on and let it surf all the sites I like all the time. And then just surf other sites according to the same algo that google uses.
It creates my own personal index and a page rank according to my own preferences, and I could give a care less about what the engineers at google think I want. (Sorry googleguy) These absurd "named" google index updates will not matter. Index updates will happen when I sleep or am on the phone. They'll pause when I have to finish what I'm doing on my PC.
I know that Google has 56,000 linux boxes in a cluster or whatever, but when I have 2 Terabytes of disk in 5 years, and i have a 10000Ghz Pentium 7 with a big fat Internet Pipe, why can't I just do this on my own? It might not contain every site in the world, but the truth is, I doubt that I need that anyway.
Google will still have a place in the world, but the desktop will return as the source of power, and you won't have to rely on somebody else's algorithms to find the pages you want.
I know that Google has 56,000 linux boxes in a cluster or whatever, but when I have 2 Terabytes of disk in 5 years, and i have a 10000Ghz Pentium 7 with a big fat Internet Pipe, why can't I just do this on my own?
Because by then there will be 999.999.999.999.999.999 pages...
Why would you want to do that though? Personally I think it would suck to get a bunch of sites I'm really not interested in.
Also, it would suck to be the owner of a site since I could impossibly tell if it was a real page view, or just a dumb download.
I'll get by seeing lots of sites I don't want by having an index and searching that the same way I search google. I'll have an automated routine that tosses (immediately) stuff that I don't think is releveant.
The problem is that there's a lot of entrenched experts who think that the website should define the user epxperience, instead of the user.
Like it or not, information is a commodity and bandwidth gets cheaper by the day. The htacess lists won't work for very much longer. There are a lot of distributed projects on the web and it'll be easy to get around almost any kind of ban: IP, User Agent, whatever, with massively distributed, p2p technology.
Search engines are stuck right now not because of technology, but because of social issues, the big barrier is do I trust them with my personal info?
So an open index wouldn't really solve it. You'd need to have the desktop app be able to build the index itself to get different kinds of search results.
Somebody could easily build an add-on to mozilla to index sites every day using a custom filter and have a special search box.
[snowtide.com...]
Zap the link if you like, I have no interest in the company other than the fact that I think it may be relevant to the thread.
The concept of a web search spider on every desktop (and even every laptop) is a sure-fire recipe for an internet meltdown; the internet is not infinitely-scalable at no cost. In most cases, my access rules and scripts *can* tell the difference between a robot and a surfer, and since I pay the bandwidth bills, I *will* continue to decide what is abusive and what is not. Even in the case of a perfectly-implemented spider with human behavioural traits, the very least I'm going to do is throttle its resource access rate to something sustainable.
I'm am all for an open web. I am not, however, in favor of an open bandwidth-overage-fee pipe to my wallet.
The site where my restrictions are tightest is a non-commercial site, and so is very sensitive to hosting costs.
I think any responsible 'bot programmer or scripter should spend a moment thinking about the effects their creation will have if multiplied a thousandfold or several millionfold. If it still seems responsible to release the code after that thought experiment, then fine, release it. But don't blame me for blocking it if the thinking wasn't quite deep enough, and the 'bot provides no benefit for my visitors or my site in return for the bandwidth it uses.
MHO,
Jim
The Search Engine that Can Read Your Mind
A search engine that is customized to your sex, geographic location, age, interests, hobbies,etc. I heard a presentation by Yahoo! in which this was discussed. Although desktop implementation was not mentioned specifically, a desktop search engine would be the ideal way to provide a highly accurate search experience.
The Google, Yahoo!, AskJeeves Toolbars are but a primitive implementation of desktop search. Naturally, privacy issues would have to be addressed. But if millions of users can be persuaded to use toolbars and MS Passport, then having a personalized Google Search Engine on your desktop may not be that great of a leap for most people.
Desktop Search: Is it really so far-fetched?
Desktop search could be the next wave 5-10 years down the road. For example, Microsoft is already integrating search into it's next operating system, codenamed Longhorn. Data storage and file retrieval is a big part of the next OS: seamlessly finding stuff in your hard drive or on the web is a high priority of Longhorn. As far as I know, this isn't a true desktop search engine, but it's a step in that direction.
I don't see bandwidth issues as much of a problem if you limit the amount of the spidered web to the users interests. For instance, if you are interested in Baseball, why would you need a copy of the USDA web site?
Know consider that - the tool john316 pointed out is just one of many, that are available right now and provide powerful aggregation & filtering mechanisms for more 'personalized search'.
One of the biggest problems with this - aside from the bandwidth cost potential, as has been pointed out several times - is simply:
a lack of diverse data on the net. Sure, you can get a set of personalized 'widget results' but what if there are only 10K pages with that word string? Even when you do a 'find all words in any document' instead of 'find this literal string search'...
this happens quite often. Also, people (joe public...) still don't even use the advanced search page on most engines and those are way more powerful than the default options.
It will take first, a 'new revolution' in user sophistication for people to 'buy in' to these ideas, imho, before there is the market demand for them.
You might want to buy from me? Oh, too bad I run a non-profit Web site that has a ton of nifty information on it then, eh?
The idea is good, but it would never work in reality.
What about a freshly installed computer? Should it be shipped with the entire Internet on it? You would initially need a search engine, or else you would have to wait a week after connecting to the net before you can start using it.
Also, why would you want to do this on every single computer when there are centralized services that don't need a few years to come up with something like this - without you having to pay for it. Part of the problem is technology - it is difficult to make such a search engine efficient enough. But, trust me, it will come!