Forum Moderators: open

Message Too Old, No Replies

WebCopier v2.8a

         

madmatt69

6:01 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anybody seen this one before? If it's copying my pages I guess it should be banned in my robots.txt.

Does anybody have a list of a robots.txt which bans a lot of the bad spiders?

AmericanBulldog

6:06 pm on Apr 2, 2003 (gmt 0)

10+ Year Member



Take a look at the WebmasterWorld Robots.txt for a good example

wilderness

7:02 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



madmatt
Webcopier or any other UA that begins with web will likely not bother reading your robots or "your suggestion" to follow your robots.txt.

Best place to deny anything that begins with web is in your htaccess.

Don

EliteWeb

7:06 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Download.com has tons of programs to make it possible to save webpages to your hard disk. As wilderness mentioned the easy or only possible solution around this is with your htaccess files. If your not proficient with it now would be a good time to start reading up on it.

felix

7:33 pm on Apr 2, 2003 (gmt 0)

10+ Year Member



Here is a snip from the WebCopier site

Use this powerful offline browser to record websites and store them locally until you are ready to view them.

· Save complete copies of your favorite sites, magazines, or stock quotes.
· Students can download enormous amounts of information from the Internet for later study.
· Teachers can download whole sites so their students can view them later.
· Developers use this tool for analyzing websites.

This person will most likely be finished doing whatever it is they are doing before you could block his/her IP in your htaccess.

madmatt69

7:48 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Whoah that's all kinda disturbing. I don't want people downloading and 'analyzing' my site. How would I ban it in my htacces file? Is there a thread tha already deals with this?

And thanks everyone for your responses!

wilderness

9:38 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>This person will most likely be finished doing whatever it is they are doing before you could block his/her IP in your htaccess.</snip>

Felix this is not sound advice. :(
Granted if you begin learning htaccess today and a UA vists your site that is not included in your denies than the UA will be able to grab most anything they desire.
However if you do some extensive reading in the archives and start with an "established" htaccess than the chances of a bot especially a downloading or ill-behaved being denied from gathering your data are rather good.

However there are exceptions to all applications. Both good and bad.
I'm most overbearing in my denies and well cut and IP just because I stand to gain nothing from that country or reigon. Most webmasters are much kinder and less restictive than myself.
Jim is learning :)
However even Jim uses a bot trap which is fairly automatic in creating denies.
Others are working on alternatives as well to STOP these ill-behaved bots in their tracks.

Please don't go off the deep-end in us :)

felix

10:12 pm on Apr 2, 2003 (gmt 0)

10+ Year Member



Wilderness. I understand your point. If he starts as soon as he detects the rogue UA, he can relatively quickly prevent future intrusions from the same bot. My only point was that whoever it was that he detected is probably finished already.

On the other hand, I don't think there is any way to keep a web-site downloader from disguising itself as a legitimate browser. There are probably programs out there that do, so wouldn't that mean that the real "bad guys" are undetectable?

Owner edit - typo

wilderness

10:39 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>so wouldn't that mean that the real "bad guys" are undetectable?</snip>

Not in my particular instance.
Although I have one that visits like a thief in the night. Sometimes only grabbing a page on a single day. Never more than three pages in a day and yet doesn't visit every day. There is no method to the visits.

Somebody who visits and acts ill-behaved is easy to spot in logs (that is if you monitor your logs) if your aware of the traffic and visitors your looking to have and the types of data that the visitor is interested in viewing.

It cannot be done on all websites. Although I would venture to say that any webmaster who focuses on a particular markert can over time realize what type of visitor "comes through his doors." If he can't than he should not be in business.

If the webmaster has either a cosmetic site, a site which has no relavance or particular goals than ill-behaved bots hardly matter.

jdMorgan

8:27 pm on Apr 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm... heard someone was talkin' about me... :)

Felix:

On the other hand, I don't think there is any way to keep a web-site downloader from disguising itself as a legitimate browser. There are probably programs out there that do, so wouldn't that mean that the real "bad guys" are undetectable?

There is a fairly good way to stop these 'bots before they grab your whole site, and that is to use a bad-bot trap. For more information, see this thread [webmasterworld.com].

There are also methods to detect user-agents which access your site too often, and ways to detect even slow-bots. A routine can be written to create a file or a database entry for each visiting IP address, and then track it over time to detect too-frequent accesses. You can also record an IP address, and check to make sure that it requests a non-cacheable image file along with each page that includes it. This method will detect even the slowest "stealth robots," but you must then check to make sure it is not a known-good 'bot.

None of these methods work with 100% effectiveness; You just have to decide how good is good enough, and how much effort you wish to expend on the problem.

Jim