- If you choose to block problem actors -
Methods of Blocking*
• Check Header fields and block if abnormal:
• Block Server Farm IP Ranges
• Block by behavior: requests too fast, requests for pages but no other file types, supporting files but no pages, requesting same page as referrer, requesting same file redundantly more than 3Xs.
• Block by User Agent: block known scrapers & malicious actors. Developers & bot runners can name their agent anything they want, and often use benign or misleading names to gain access to your files. Search Engine Spider & User Agent ID Forum
• Block if no UA
• Block if HTTP/1.0
[webmasterworld.com] - this is an old protocol in use by mostly older bots and a few beneficial link/file validators.
• Block if changing UAs more than 3Xs. Sometime proxy & VPN users (example: schools) will use the same IP address but some users will have a different UA, however scraping software may change UAs often as a means of access.
• Block by referrer: hot-linking, bad neighborhoods, etc
• Block if redundant requests for same page more than 3Xs within a time frame. Some bots request files very fast, beyond what a browser does.
• Block IPs automatically with a Bad Bot Script
[webmasterworld.com] Warning: this method is limited to those agents that disobey robots.txt by requesting a bait file & may produce false positives so consistent oversight is needed.
*Blocking methods may be used separately or in combination.
**Blocking server ranges may or may not be an effective defense for unwanted activity at your web site. Hosting companies lease ranges to a wide variety of clients, not all necessarily negative to your site's interests. Some may be extremely helpful.
Note: If you choose to block without prejudice, be prepared to watch your server logs each day with diligent focus to see just who exactly is being blocked. This takes consistent maintenance.