Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

targeted VS random content theft

         

bobmark

7:26 pm on Nov 24, 2005 (gmt 0)

10+ Year Member



As I have previously posted, my site was hit badly by content thieves, especially those with user agents of Java/?.

Having embarked on a protection campaign I noticed one thing I think is significant: almost all of the thieves moved on after they were blocked. This is pretty much what you would expect as its nothing targeted, just cherrypicking sites that rank well and stealing content to help with their Google Adsense click-thrus. Block them and they move on to other victims.

However, there is one exception. Someone using a dialup from my own country (Canada) has gone to the trouble of altering his/her User Agent several times to defeat my block. As it is a dynamic IP, I can only block by IP address after the fact, which is mostly useless unless I either want to ban a very large range or unless he/she is assigned the same IP address by fluke.

The point is, 99% of thieves are just random parastites - their robots pick up whatever sites they can and one of their stolen pages is "Travel to Zambesi" the next "1001 Uses for Cooking Oil." I believe the persistent thief is a competitor deliberately trying to sabotage my site, as why else go to a fair amount of trouble to defeat my block. Because it is done through dialup with no link to any site, I have no way of knowing who is behind it.

Anyway, thought some of you might be interested in a pretty strong circumstantial case that there ARE competitors out there deliberately trying to put you in Google hell.

Brett_Tabke

3:54 pm on Nov 26, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> almost all of the thieves moved on after they were blocked

Not if you have tasty, and freshly updated content. It is the biggest issue facing webmasters today.

[webmasterworld.com...]
[webmasterworld.com...]

LunaC

6:11 pm on Nov 26, 2005 (gmt 0)

10+ Year Member



I'm trying to fight the same problem with Java/ user agents as well, I tried blocking but since it just changes the string of numbers after I block.

The IP address also seems to change, so I may be misreading and it actually is many different ones.. either way.. no images or robots.txt are requested and I can't find the exact htaccess code to block all Java/ requests.

Can anyone direct me to the code?

EDIT:
Found this:

SetEnvIfNoCase User-Agent "Java/1.4." keep_out
SetEnvIfNoCase User-Agent "Java/1.5." keep_out
order allow,deny
allow from all
deny from env=keep_out

Hopefully that will block the bad guys without causing problems for real users, anyone know it that's right? See any issues with it?

kwngian

6:36 pm on Nov 26, 2005 (gmt 0)

10+ Year Member




Just a thought. Most bots will eithier crawl contents only or images only, while most actual surfers will request both the html contents and the images. If there is a possibility to work towards the direction of detecting such behaviour, then it would be possible to block most of them.

Or forcing a cookie on them.

Or combination of both.

Any good php coder that can help?