Forum Moderators: not2easy

Message Too Old, No Replies

A third of my content stollen - a scary story

Why it's important to check your server logs every day

         

Nick Jachelson

5:20 pm on Jan 15, 2006 (gmt 0)

10+ Year Member



I was just going through my server logs yesterday and noticed something bizzare. My site barely gets 2000 hits a day, yet here were 20,000 hits from a particularly strange and non-related referrer. Obviously these hits did not register on 2 other trackers I used and only showed up on the raw server logs. I also checked out that referrers website and, not suprisingly, there were no links to my site anywhere.

Then I looked at the detail data in the logs and realized that somebody has been, over the past 3 days, systematically downloading every single page from my site using some sort of an automated program that allows you to spoof a refferrer and the user-agent. The fact that his ip traces to a DSL account and that ridiculous refferer means that it's certainly not a bot.

I quickly denied his ip with a .htaccess file, but the damage has already been done. I have no idea how he is going to use my content, but most likely it will end up on a splog or some spammy website that will only dilute my Google rankings.

From now on I am going to be checking my server logs every day an anyone (except a bot) who downloads an unhuman amount of pages without it registering on my other trackers will get his ip denied. Of course, this guy could have just as easily masqueraded as GoogleBot and I would have been non the wiser.

I also plan to sprinkle my content with hidden JavaScript and images so that it could "phone home" if it ends up somewhere else. Of course anyone determined enough could simply parse those things out.

Any other ideas?

Key_Master

6:06 pm on Jan 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Search this site for "spider trap" and learn how to implement one properly. That will cut down on a lot of pests.

Staffa

7:45 pm on Jan 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"could have just as easily masqueraded as GoogleBot and I would have been non the wiser."

Once you are in the habit of checking your logs regularly you would notice that the IP of the spoofed UA is not similar to the real thing.

Then check the IP and block it.

Nick Jachelson

8:19 pm on Jan 15, 2006 (gmt 0)

10+ Year Member



Thanks so much Key_Master!

I took the idea one step further. I set up a bunch of spider trap urls (indistinguishable from my actual URLs) using RewriteRules that all lead to the same spider trap PHP page. Once the same IP falls 10 times into the trap, I send out an email to myself.

Then, instead of doing a deny, I keep everything the way it was, except, I begin replacing all my content with articles about Santa Claus, the Easter Bunny or Chewbacca (chosen randomly) for that IP. This way, by the time he realizes what happened, he would have downloaded hundreds of megabytes of garbage. I have almost unlimited bandwith so this is not a problem for me.