Forum Moderators: open
Use this powerful offline browser to record websites and store them locally until you are ready to view them.
· Save complete copies of your favorite sites, magazines, or stock quotes.
· Students can download enormous amounts of information from the Internet for later study.
· Teachers can download whole sites so their students can view them later.
· Developers use this tool for analyzing websites.
This person will most likely be finished doing whatever it is they are doing before you could block his/her IP in your htaccess.
Felix this is not sound advice. :(
Granted if you begin learning htaccess today and a UA vists your site that is not included in your denies than the UA will be able to grab most anything they desire.
However if you do some extensive reading in the archives and start with an "established" htaccess than the chances of a bot especially a downloading or ill-behaved being denied from gathering your data are rather good.
However there are exceptions to all applications. Both good and bad.
I'm most overbearing in my denies and well cut and IP just because I stand to gain nothing from that country or reigon. Most webmasters are much kinder and less restictive than myself.
Jim is learning :)
However even Jim uses a bot trap which is fairly automatic in creating denies.
Others are working on alternatives as well to STOP these ill-behaved bots in their tracks.
Please don't go off the deep-end in us :)
On the other hand, I don't think there is any way to keep a web-site downloader from disguising itself as a legitimate browser. There are probably programs out there that do, so wouldn't that mean that the real "bad guys" are undetectable?
Owner edit - typo
Not in my particular instance.
Although I have one that visits like a thief in the night. Sometimes only grabbing a page on a single day. Never more than three pages in a day and yet doesn't visit every day. There is no method to the visits.
Somebody who visits and acts ill-behaved is easy to spot in logs (that is if you monitor your logs) if your aware of the traffic and visitors your looking to have and the types of data that the visitor is interested in viewing.
It cannot be done on all websites. Although I would venture to say that any webmaster who focuses on a particular markert can over time realize what type of visitor "comes through his doors." If he can't than he should not be in business.
If the webmaster has either a cosmetic site, a site which has no relavance or particular goals than ill-behaved bots hardly matter.
Felix:
On the other hand, I don't think there is any way to keep a web-site downloader from disguising itself as a legitimate browser. There are probably programs out there that do, so wouldn't that mean that the real "bad guys" are undetectable?
There is a fairly good way to stop these 'bots before they grab your whole site, and that is to use a bad-bot trap. For more information, see this thread [webmasterworld.com].
There are also methods to detect user-agents which access your site too often, and ways to detect even slow-bots. A routine can be written to create a file or a database entry for each visiting IP address, and then track it over time to detect too-frequent accesses. You can also record an IP address, and check to make sure that it requests a non-cacheable image file along with each page that includes it. This method will detect even the slowest "stealth robots," but you must then check to make sure it is not a known-good 'bot.
None of these methods work with 100% effectiveness; You just have to decide how good is good enough, and how much effort you wish to expend on the problem.
Jim