Forum Moderators: open
Do you think it might be a good idea to post a message relating to the rogue spider problems and why people cannot view topics unless they are logged in?
I can't post security policies. This has shared more than I should. Same reason Google can't tell us what is spam. 10 mins after you explain it, they figure out a way around it.
I feel for the bystanders that find it a problem because they just happen to be on an isp with an abusive user. On the other hand, if they don't care enough about the system to simply login and participate, then those resources are better spent for active community members.
I've taken many of these up with ISP's. The most responsive has been AOL and I removed them from the required login earlier this week (Thanks AOL). A few of the other notorious isps have never responded to abuse reports.
You have to understand the volumn of page views we are doing. The scope of the problem with spiders is far beyond trivial - it's a critical red alert condition. We have somewhere around 250k pages here. (I don't know how many because of the thread split system). Just 1 aggressive spider puts the whole system at risk with a ddos attack.
I spend an hour a day dealing with this now. I wish I had that time back.
Had one from three isp's that pulled 255k views in 1hr. So - btopenworld, mindspring, and ntl.com went on the list of required logins. I will not roll that back any time soon.
WebmasterWorld is fully crawlable by off the shelf windows spiders where most forums aren't. Joe 'weekend hacker' can scarf the entire forums with just about any generic windows downloader. That puts us in the cross hairs like few dynamic sites experience.
>Couldn't you just block the spiders and not the entire ISP?
They come in with a random ip and use stock browser agents. You can't touch em that way. Had one from Mindspring and one from Cox that used over a hundred ips in one day. The only way to slow it down (you can't stop it) and protect the board is to require login from the most abusive isps. Nearly the same that Google did to that particular cable modem company last spring.
As for the scope of users, it's pretty minor at less than 0.50% getting the required login per day. And yes there are se's that are indexing subscriber content on many sites - it's getting fairly common - just break out your checkbook and signup for a xml feed. Google news also has done it before.
Lastly, big thank you to WestHost [westhost.com] for working with us on this issue, giving us the tools to deal with it, and putting up with the excessive bandwidth usage by spiders. We are well beyond needing a dedicated server (two or three actually) and WH has been a trooper keeping us up and running. Thanks Chris.