Forum Moderators: open

Message Too Old, No Replies

CJ Spider

?

         

DavidT

6:23 am on Apr 2, 2003 (gmt 0)

10+ Year Member



207.71.241.81 - - [01/Apr/2003:20:13:40 -0800] "GET / HTTP/1.1" 200 6805 "-" "CJ Spider/"

Can't find anything out about this one. Anyone else know?

msgraph

3:18 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Noticed this spider last night too.

Do you have CJ links on your site(s)?

I wonder if it has to do with the new stuff Commission Junction is supposed to be rolling out soon.

DavidT

6:49 pm on Apr 2, 2003 (gmt 0)

10+ Year Member



I'm a member of CJ, just joined a new program there the day before the spider came but as yet none of their links on the site.

korkus2000

6:51 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have basic cj links. I got the spider last night also. I think they are using it to check invalid links.

EliteWeb

7:05 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yah CJ Bot is a robot to check for the links on the sites. You know what they said about charging the people who dont make x amount per pay period maybe they will see if the url's in the users profiles apear on those pages and use it for reference or a way to disable accounts when it doesnt find the links.

carfac

4:35 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I just got hit by this bot- and I can tell you, it will NOT be back on MY sites!

Looks like it was hitting about ten pages per second. VERY bad!

Plus, it is STUPID. I have URL's that are upper and lower case- all the requests were lower case.

I do not know for sure that it follows robots.txt, but I did not catch it cheating. Hit me 120 times.

The URL checks out to a Santa Barbara location, which makes sense for Commission Junction.

Here is a hit:

207.71.241.81 - - [08/Apr/2003:03:21:24 -0600] "GET /a/lower/case/path/that/is/wrong HTTP/1.1" 404 11948 "-" "CJ Spider/"

dave

nativenewyorker

6:32 pm on Apr 8, 2003 (gmt 0)

10+ Year Member



carfac,

207.71.241.81 is definitely a CJ IP address. I have seen it in previous log entries when CJ support had checked something on my site. If you block that IP address, you will also block all human visitors from CJ. It is probably best to deny it by agent instead.

And yes, I got hit by this one for the first time today.

Ted

carfac

9:52 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



nativenewyorker:

Thanks... I will do that!

dave

nativenewyorker

8:00 am on Apr 16, 2003 (gmt 0)

10+ Year Member



CJ is now using a different IP for CJ Spider registered under a Cable & Wireless block.

216.34.209.23

Ted

nativenewyorker

9:58 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



CJ responded to an inquiry about their CJ Spider. They described it as "an automated network policeman". It is supposed to look for content to be reviewed by the compliance department.

The dilemma is will banning the spider trigger an automatic review of a website or just eliminate the unwanted bandwidth usage?

Ted

DavidT

5:24 am on Apr 25, 2003 (gmt 0)

10+ Year Member



No matter what I put in my htaccess file to block this spider it just won't work.
Using mod_rewrite what would be the correct syntax to block 'CJ Spider/'?

wilderness

10:15 am on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SetEnvIf User-Agent ^CJ keep_out (or whatever you use instead of keep_out)

jdMorgan

12:26 am on Apr 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DavidT,

>> Using mod_rewrite what would be the correct syntax to block 'CJ Spider/'?


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^CJ\ Spider/
RewriteRule .* - [F]

Ref: Introduction to mod_rewrite [webmasterworld.com]

HTH,
Jim

nativenewyorker

1:35 am on May 14, 2003 (gmt 0)

10+ Year Member



The U/A has changed. It is now CJNetworkQuality; [cj.com...]

Note that their documentation on the above page indicates that their spider

will visit, on a daily basis, all registered sites and all pages that have generated traffic within the past 30 days

Ted

DavidT

12:14 pm on May 17, 2003 (gmt 0)

10+ Year Member



It went for robots.txt today, first time I think, so it may obey this.

carfac

4:40 pm on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



will visit, on a daily basis, all registered sites and all pages that have generated traffic within the past 30 days

That is a bit of an over statement, don't you think? They might TRY, but if they try to visit EVERY page on my site EVERY day, they'd have to pull in 150-200,000 pages a DAY (as a guess). And if they try THAT...

dave