Forum Moderators: open

Message Too Old, No Replies

learning curve on spiders

stupid question du jour

         

sdtex

2:28 pm on Mar 29, 2002 (gmt 0)



I'm a writer with two brand-new sites, and not a full-fledged webmaster. I'm on a steep learning curve.

I've been reading information here and at other sites about spiders and there's one thing I'm not clear on: why are spiders considered objectionable? LNSpiderguy has been spending a lot of time on my site, and this week it's Wget/1.7. I see a lot of discussion on blocking such spiders and I'm not clear why.
I apologize for my ignorance but my research has yet to give me a clear answer.

Air

3:18 pm on Mar 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to wmw sdtex,

Probably the most common reasons some spiders are found to be obectionable is that they indiscriminately request pages, i.e. without any pause between requests, this causes the web server to have to spend too much of it's time serving this spider's request the server slows down as its resources are consumed affecting response time for all visitors to sites on the server.

Likewise these spiders can consume huge amounts of bandwidth, again affecting response time, and costing you money because additional bandwidth needs to be provided to be the sure legitimate requests can be accomodated. On some large sites, when one of these poorly behaved spiders comes through it can bring response to a standstill for everyone else.

The above are weighed against the benefis your site may derive from such spidering, if it is none or marginal, then they are needlessly consuming your resources without giving back anything. Even worse some of these spiders are harmful, they collect entire sites, or e-mail addresses for their own selfish reasons.

The larger your site is, the greater the impact of these rogue spiders. On small sites they are less noticeable as resource wasters.

Macguru

3:24 pm on Mar 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<added> Air beat me!</added> :)

Hi sdtex,

Welcome to WebmasterWorld, you just can't find better. Dont apologise, we all keep learning every day here. Sharing informations is what this boars is all about.

Those nasty spiders are at least sucking on your bandwith and will not bring you any visitors. A lot of them harvest email adresses on web sites to sell spammers, and others are gathering datas for market studies of large corporations, or else.

Here is a good tool to get rid of all of them. [webmasterworld.com]

Enjoy!

sdtex

3:32 pm on Mar 29, 2002 (gmt 0)



Y'all are great!

So, even though my site is very puny, I should consider spiders a problem?

My book site is only five or six pages and I get an average of only six or so hits a day (unless I do a radio show or something, which gives me a spike in hits). LNSpiderguy hangs out there several times a week. Must be an incredibly slow reader.

If someone sends me an e-mail through my site, can these spiders grab that person's address for spam?

Are there "good" spiders just there to index for search engines? So far I have not made it into any major search engines. (I can't afford Yahoo just now, though I know that's important.)

Macguru

3:50 pm on Mar 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>If someone sends me an e-mail through my site, can these spiders grab that person's address for spam?

If you are using a form to get written to and store these adresses somewhere on your site, yes. If they click on your e-mail adress via a MAILTO: link only your adress is exposed to bad spiders.

>>Are there "good" spiders just there to index for search engines?

Yes

>>So far I have not made it into any major search engines.

Well you came to the right place to learn how to do it.

If you want to learn about it, I recommend you to visit the library [webmasterworld.com], (top menu) where all the best treads are stored. Also, you can use the site search [searchengineworld.com] feature to query about specific topics.