Forum Moderators: phranque

Message Too Old, No Replies

Never Ending War Against Bots

         

keyplyr

4:08 am on Dec 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Over the last couple years I'v learned quite a lot about using mod_rewrite, mod_access, SetEnvIf and other techniques to control who/what is allowed to access files on my website. I've developed a rather intimate relationship with my .htaccess file :)

But it seems almost futile with the never ending new wave of user agents, home bots and download tools increasingly being available and now coming from everywhere!

When checking my stats an hour ago, I gleaned I had blocked attempts to access files from an undesirable. Then after spending a short time at a favorite discussion forum, I again ran my stats program. In less than 15 minutes, some home bot (blank UA) with a Norway IP had pulled every single file off my server (10k) including all those that are disallowed (no robots.txt requested ), and many files pulled several times.

I normally cruise along just under the monthly bandwidth allowed by my hosting service, which is very liberal. Mass downloads like this really infringe on my rights to exist on the web. I guess what one person in one culture views as a right, another may view it as weakness. I've developed an extremely defensive, guarded attitude toward what was formerly an interesting, alluring wonder.

Key_Master

4:35 am on Dec 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to the club. :)

Matt Probert

1:14 pm on Dec 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would what you describe be any different to a human using a web browser visiting every page on your site?

Matt

FalseDawn

6:33 pm on Dec 18, 2005 (gmt 0)

10+ Year Member



What files are you referring to? If they are plain HTML with a few images, I don't see how a bot can increase your bandwidth usage that much.
If you are hosting large multimedia files, then just limit access to logged-on users with a valid account and consider obscuring the paths to you files, too.

If all else fails, get a hosting account with more bandwidth and stop worrying about it.

Key_Master

6:43 pm on Dec 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Bots harvest e-mail addresses, form urls to automate post spam, gather trademark violation information, scrape content for MFA sites, click on competitor ads- the list goes on.

I can think of a whole lot of reasons to limit their activity. Don't give up keyplyr. You can beat them and still make your site accessible to your visitors and search engine bots. What you need is a dynamic spider trap.

keyplyr

7:18 pm on Dec 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Would what you describe be any different to a human using a web browser visiting every page on your site?

Yes. This event saw no requests for CSS or Javascript files. No requests for page images until it hit my image directories, which it took 7 thousand images alphabetically, some of these files not even linked to. Still think its a browser?

What you need is a dynamic spider trap.

I've considered the spider trap being talked about here. Several bots that do not request robots.txt I actually do want to allow. Many of the new Asian, Pole, Czech, etc bots do not request it either but these guys buy my products. I have decided to do it all manually, at least for now.

topsites

7:22 am on Dec 19, 2005 (gmt 0)



Somedays, violent tendencies come to mind...