Forum Moderators: open

Message Too Old, No Replies

Dow Jones Searchbot

         

dstiles

6:25 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Came from a Canadian DSL range and hit several sites two days following at anything up to two a second.

IP: 67.68.234.nnn (Bell Canada HSE DSL)
UA: Mozilla/5.0 (compatible; Dow Jones Searchbot)
Only one header field (HTTP_ACCEPT) present.

Took home page plus another (iframed page) THEN robots.
Took images, CSS, JS from home page.
Took subsequent pages but only HEAD for pics - no CSS/JS.

Nothing obvious on google, not even in bot directories. Looks new this month.

I know the financial crunch is hitting hard but operating Dow Jones from a Canadian broadband IP? I imagine a small shack in the woods surrounded by bears and bull elk... :)

keyplyr

9:56 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DowJones did some scraping from 205.203.134.197 earlier last month which earned themselves a ban.

205.203.96.0 - 205.203.159.255
205.203.96.0/19, 205.203.128.0/19

dstiles

10:53 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So why, if it was YOU that banned their IP range, did they hit ME with a dynamic IP? :)

Actually, I do have that range blocked.

I wonder if they are pushing a bot to the public or if my instance was just forged. Who knows? Who cares? It's dumped.

Pfui

2:33 am on Dec 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No clue about Canadian relations. Are you in the US, dstiles? If you're not, perhaps they cover different geo regions?

Also, note cloaked UAs in --

-----
SEPTEMBER-OCTOBER, 2009

205.203.134.19n <== Plainsboro Dow Jones-telerate
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

robots.txt? Yes BUT... Ignored.

Notes: HEADs for pics as described and GETs. See also Project Honey Pot [projecthoneypot.org].

-----
JANUARY, 2009

208.138.254.15n <= Richboro Dow Jones & Company
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

robots.txt? Yes BUT... Ignored.

Notes: Ditto. See also Project Honey Pot [projecthoneypot.org].

keyplyr

4:27 am on Dec 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, also seen hits from (and also blocked):

208.138.254.15*
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Pfui

6:43 pm on Jan 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Plainsboro Dow Jones-telerate making the rounds again; two hits within two seconds:

205.203.134.197
Mozilla/5.0 (compatible; Dow Jones Searchbot)

robots.txt? NO

KenB

4:00 am on Jan 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Been seeing Dow Jones show up in my server logs. I'm blocking by both UA & IP ranges as I catch them.

Pfui

6:57 pm on Jan 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nothing new here; just another two hits in two seconds:

205.203.134.197
Mozilla/5.0 (compatible; Dow Jones Searchbot)

robots.txt? NO

However, that hit near a swarm of Twitter fellow travelers... possibly tweet-tracking? If yes, or even if no, why is Dow Jones trawling anything?

dstiles

10:08 pm on Jan 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could almost be a distributed bot - user starts computer, all local bookmarks/bots go online... But it seems as if only my initial report showed a non-DJ IP.

Dow Jones-Telerate
205.203.96.0 - 205.203.159.255

DOW JONES & COMPANY (under Savvis)
208.138.254.0 - 208.138.254.255

In fact from a robtex Class C check it looks like the 205 range is a server farm, or at least a large virtual shared server. No rDNS for that specific IP. I can't get rDNS for the few IPs I've tried in the 208 range.

Don't suppose it could be tracking sites for DJ that show up well on twits?