Forum Moderators: open

Message Too Old, No Replies

Arachnophobia : when spiders become too thick

         

Brett_Tabke

4:20 pm on Sep 27, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think there is a proportional effect between the drop of referrals from search engines over the years and their amount of spidering.

Take for instance right now on this system:

AllTheWeb - full crawl.
WiseNut - full crawl.
Google - partial crawl.
Ink - Sporadic spidering.
Northern Light - significant spidering.
Alta - significant spidering.
Excite - sporadic spidering.
3rd party spiders - much spidering.

That's close to 95k pages in the last 48 hours. (probably 25k already this morning during peak hours)

Q: Is there an answer? It is a massive waste of internet resources and bandwidth.
A: A common centralized spider system paid for by the search engines. The system spiders the sites, then the se's just pull from the common db.

It would result in fresher pages and significantly reduced internet load. It is foolish for all these companies to be duplicating the same function.

Travoli

4:26 pm on Sep 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



But in a business sense, nobody would be able to brag about the "largest database" and nobody would be striving to better index dynamic content. With nobody trying to out-do the competition, I think that would make things turn a bit stale.

drbill

4:36 pm on Sep 27, 2001 (gmt 0)

10+ Year Member



Brett good point.. Yes this would be a good way to save on bandwidth. I have had google push some of my servers to 1 mps while spidering pages. I doubt that we will ever see it as all these engines like the idea of having their own database.

caine

4:37 pm on Sep 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with you Brett,

Travoli, certain SE's already use centralised results and filter them to through their own relevance criteria's

Travoli

4:59 pm on Sep 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes many use the same spiders, but just about everyone is missing out on the dynamic content on the web. I believe Google is developing the technology to capture these pages because they would like to keep their title of "largest SE database." If each company put their algorithms into the same database, with the same goal of providing relevant results to searchers, don't you think we would start to see the same pages on every SE as they refine toward the algorithm that most closely matches searchers' needs? Not to mention, we would see "X search results" on every engine. The SE's lose a way to differentiate from each other, and "braging rights."

Seems like where we see a dominant service, we see either predatory behavior or slow adoption to new technologies. The search engines would become dependent on the outsourced spidering and it seems as though the entire industry would be very negatively affected by a disruption of the central spider, or by pricing pressures from the monopolistic spider.

Also, in my industry, a central spidering system would not stop the 3rd party spiders that are specifically set up to spider our site's dynamic content. It takes engineers to develop scripts that will work specifically with our site's content. Because of the resources required, those specialty spiders will always need to remain independent.

By saying that many already use the same database (thinking of Ink), and with my assumption that 3rd party spidering would continue, the gains seem to be worth less than the losses.

Ove

8:12 pm on Sep 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BRETT
You gave us a very dietaled info about how a spider work maybe some mounths ago i cant find that forum please give it to us agin

/Ove