Forum Moderators: open

Message Too Old, No Replies

why are spiders bad?

sorry if this is a rookie question

         

SammiGirl

6:44 pm on Jul 15, 2003 (gmt 0)

10+ Year Member



I may just be naive but I see all these things about how to block being crawled by spiders so I'm wondering why this is so bad or is it that bad? Again, maybe this is my naivite but doesn't that get you search engine placement?

Any help would be very very appreciated.

Thanks,
Sammi

killroy

6:45 pm on Jul 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well spiders aren't just search engines...

I have a large directory website with over 80000 pages, and it gets extracted at least once a month by competitors or others who want my database and are doing so illegally.

SN

jeremy goodrich

6:46 pm on Jul 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not all spiders are used for search engines - some copy your website for nefarious purposes, "monitor" your website which causes unnecessary bandwidth drain, and others harvest your email address(es) from your sites.

Yet others work for search engines - and, those engines either don't or won't amount to any "significant" traffic, so it may save bandwidth by only allowing 'selective' access to your websites. :)

Too much traffic is, in many ways, a fun (though tiresome) problem to have, it raises all sorts of "new" issues with web design, conversion rates, usability, and bandwidth sucking.

bakedjake

6:47 pm on Jul 15, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Some spiders are evil and ignore robots.txt. Some spiders are evil and harvest email addresses for spam. Some spiders are evil and put unnatural resource drains on the web server.

Not all spiders are as friendly as the major search engines (who tend to be the most respectful). Webmasters are concerned about these rogue spiders, and want to block them.

<added>Didn't type fast enough... :)</added>

wilderness

7:09 pm on Jul 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sammi,
Not all bots are beneficail to your site(s).

Here are some links to assist in identifying bots and helping you to decide if the bots are beneficail:

[jafsoft.com...]
[psychedelix.com...]
[bots.internet.com...]
[robotstxt.org...]
[joseluis.pellicer.org...]

SammiGirl

10:58 pm on Jul 15, 2003 (gmt 0)

10+ Year Member



You mentioned that they can try to get e-mails? Why is it that when I view my access logs I don't even see e-mails from my visitors?

Sammi

SinclairUser

11:52 pm on Jul 15, 2003 (gmt 0)

10+ Year Member



They try to harvest email addresses embedded in your webpages. So spammers regularly crawl the web for email addresses just to use with junk mail.

Hagstrom

2:27 pm on Jul 16, 2003 (gmt 0)

10+ Year Member



The thing that bothers me most is the time it takes to figure out which of my visitors are real humans and which are robots. If only they would say "robot" or "spider" in the UA-field :(

SinclairUser

9:40 pm on Jul 16, 2003 (gmt 0)

10+ Year Member



I would prefer a string of "cloak now" in the UA field. It would make life much simpler.

sidyadav

2:12 pm on Jul 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



But, I don't know why people build robots for bad stuff... I built a new robot for my search engine :-)

cyberkat

2:53 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



Besides just banning their IP, I write to their ISP with their violators log entries, and include the ISP with a link to our web sites "terms of use" page. So far most US, UK ISPs do follow through. To get down to it, we except NO ABUSE. And we have even gotten some spammers websites nuked :)

Part of our TOS states:
Our UCE/Anti-Spam/Virus/Hacking Policy:
We try our best to protect everyone's privacy to the harassment of spam and protection from virus content on our site. If you feel that you have received spam from signing our guest book and or messageboard. Write and tell us. We do report all spam our site receives to the proper isp. We are handling this for our domain.
Because of the increase in filtering, our site deletes all spam, email of combined text/html content, all email with attachments, and mailing lists we are not members of. UCE(Unsolicited Commercial Email) are reported to sender(s) isp, host, abuse.net, etc.
If you have any questions or comments about our Privacy Policy, please contact us. We always welcome your feedback.
Any and all unauthorized entry, port scans, gathering website statistics, pinging, webpage validations, server probes(search,options,head,etc.), abusive robots, copyright abuse, gathering & crawling website(for email addresses, images on server, hidden values, regular expressions, etc.), running link checkers, forging of formmail scripts, & hacking, will be investigated, including attempts to use the infamous formmail hack to transmit UCE / Bulk e-mail using our systems. As a webmaster we do not allow any of our users to use Matt Wright formmail script because of the possibility of our servers being used for UCE.
ALL VIOLATIONS are reported to the proper legal agencies, internet service providers with supporting files, access logs including violators' ip address, and documentation. This domain only excepts email using our online contact form, in plain text format. Our system automatically deletes incoming email containing: attachments, advertisements, html.
Summary of Legal Basis: United States Code Title 47, Section 227(a)(2)(B) states that a computer, modem, and/or printer meets the definition of a telephone fax machine. Section 227(b)(1)(C) states that it is unlawful to send any unsolicited advertisement to such equipment. Section 227(b)(3)(C) states that a violation of the aforementioned Section is punishable by action to recover actual monetary loss, or $500, whichever is greater, for each violation.

sidyadav

12:33 am on Jul 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Real Good technic :-)