Forum Moderators: open
I am sure I will hear all kinds of responses about the magnitude of the problem given the size of the web, etc. Well, if it took Home Depot 3 days to check out my order at the check stand, I would find another store. I say get with it or get out of the business! What do you think?
Search engines are definitly not the friend of the dynamic website. I would argue though that is not as bad as it seems.
If you run a dynamic site it is likely that all past data and articles are moved to an archive section of the site as new content is added. Generally, due to the time lapse of getting indexed, surfers will hit old data in the archive section. If they like what they see they will hopefully bookmark your site and you have just gained a new user.
As for the analogy of Home Depot and going elsewhere, I totally agree but as it relates to SEs - where can we go? You're only real opportunity now is to pay for spidering by INK everytime you put up a new page and they will hit you within the next couple days but that can become real expensive real fast.
In answer to your quesion 'Is the se scheme flawed?' My answer would be 'yes, but what can we do about it?'
Well, my view is that we should first agree that there is a problem and then group to fashion a remedy or suggest remedies (sort of like the current election mess - we now realize that we have differing standards in all states about how to count ballots and differing standards within states. Election laws will be re-written to clarify and standardized somewhat all over the country in the comming months).
On average, my guess is that all information that can be located in search engines will be 2 or more months old. If that fair or right to the consumer? And is that fair or right to the web page provider? And should MSN, for example, be obliged, or required to say something like "this is by no means all of the information available on the Internet about the subject you searched for. It is only the information available on the web sites that paid us a lot of money?"
I am not for starting a war over this but if there is not a standards committee on this subject, perhaps there should be.
An engine can only return results according to the data it contains in its index. For comprehensive results, this database needs to contain as many pages as possible from those available.
Many will never be spidered due to poor linkage. Other because of high graphical content, flash, dynamic page delivery, frames etc. etc.
Until an exceptionally efficient and powerful pattern recognition system is devised, the spiders will remain to be text orientated.
Now the pages left... millions (billions?) of them. Although improving all the time, the hardware laid across the world will struggle to keep pace with the increase in the number of pages.
Many large sites serve up a huge number of original content every day. Although possible to record all these sites constantly, surely the hardware required would be prohibitivly expensive.
People are already casting doubts over Inktomi's ability to re-spider paid submissions every 48 hours if the demand grows large enough. Nevermind the 99.99% of other pages out there.
Then spam's ugly head rears up... this somehow needs to be keep out of the comprehensive system.
Maybe, tomorrow, or maybe in ten years a solution will appear. Perhaps quantum computers coupled with massively boosted bandwidth will perform the whole business as a background screensaver, but with current machines/budgets... realistically I think not.
Yes, it's flawed. And I'll add another problem... once you get a page listed with high rank it's not advised -from an SEO perspective- to change or update the page. So, the highest ranked, most frequently clicked pages may indeed be some of the oldest information on the subject.
All in all, I have to go with rpking -I think we're pushing the practical (and financial) limits now. From the SE side, I'd guess that spam control is probably the single largest roadblock to speedy updates.
>And should MSN, for example, be obliged, or required to say something like "this is by no means all of the information available on the Internet about the subject you searched for. It is only the information available on the web sites that paid us a lot of money?"
Full disclosure of commercial interests? Absolutely!! Likely? No. (for more, search on ''ethics of a search return page'')
Nope the major SE databases cannot cost-efficiently provide results of the latest content on a page basis. That is why we are seeing them lowering their sights concentrating on site home pages not pages as such. The likelihoos being they assume, that someehere in that SITE you will find what you are looking for.
However, the Inktomi paid model for 48 hour spidering has good potential for building (very eventually!) a database that is more timely, with the dynamic site owners bearing some of the cost of doing this. The current implemntation is at an early stage and has problems, but the basic principle is good.
Secondly, if you are looking for current content, I wouldnt look on a major one size fits all database, i would look for smaller search engines and specialised ones and even news search engines.
A nice opportunity exists for anyone who wants to create I think a very timely index with costs bborne by the both Web site owners and the indexer which people know provides the best quality, most recent content. My feeling is it will not be one engine but a mass of smaller specialised ventures.
The Internet was never designed for mass indexing anyway, It was meant to allow small dispersed groups with a common specialist interest to share information. We may well be going back to basics.
Our buisness is created by the SEs - not by ourself. We are absolutly dependent of what SEs will or wonīt do, and we have to find our way.