Forum Moderators: open

Message Too Old, No Replies

orange you sorry you asked?

         

lucy24

9:05 pm on Nov 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone have any idea who orangeask.com is? They come from a range I've blocked on principle (50.22-23) but closer inspection shows no actual misbehavior from anyone in the area. At least not during the time period I've got saved logs for.

If I'm reading whois right, the domain has only been around since early this year. Front page is a generic search-engine screen with no links that I can see.

Full address 50.23.239.nnn

I think I'll have to unblock the range for a while just to see if they behave. Since they can't get to the main page, I have no way of knowing what they'd do on the rest of the site.

keyplyr

12:08 am on Nov 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month





SoftLayer
50.22.0.0 - 50.23.255.255
50.22.0.0/15

Plenty of nefarious culprits call SoftLayer ranges their home. Like you, I have the range "blocked on principle" (or lack of.)

For me to comment on orangeask.com I need to see the full UA as my notes are reference by IP or UA.

Pfui

1:03 am on Nov 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



unblock the range for a while just to see if they behave.

Unblock at your peril.

orangeask.com referrer log-spams on all hits. Asks for robots.txt and ignores generic Disallow. Sends zero traffic. Recent IP 50.23.239.14 (Hostname below) has Project Honey Pot comments galore (scroll down the linked page). Hails from softlayer (block-worthy on its own.).

50.23.239.14-static.reverse.softlayer.com [projecthoneypot.org...]
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

robots.txt? Yes BUT immediately ignored.
Self-REF spam? YES

E.g, from two days ago:

11/2n 07:24:50 /robots.txt
11/2n 07:24:51 /

11/1n 03:29:52 /robots.txt
11/1n 03:29:52 /

'Nuff said.

lucy24

1:50 am on Nov 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)


Yup. Down to the last semicolon. See it hanging there in midair after the inner close-parenthesis? Pretty funny-looking UA now that I look at it more closely.

I'd forgotten that they put their name in the referer slot. Somehow I magically sensed that this was bogus ;) and therefore misremembered it as part of their UA. You don't get a lot of sites saying "Hey, this place has the best robots.txt! Check it out."

incrediBILL

2:08 am on Nov 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you look around there are some PHP code libraries that enforce robots.txt rules but I think most are aimed at using in a crawler, not for blocking invalid inbound requests.

If I can't find anything specifically suited for the job, I'm thinking about adapting one of those libraries for local usage as a robots.txt enforcer type application.

Not sure it's been done before, or at least not publicly available, but that's what we need to solve this problem once and for all.

Then whether they play by the rules or not it won't matter, they'll automatically get kicked if they break any of the robots.txt rules, complete validation of requests on the fly.

:)

lucy24

8:06 am on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Follow-up:

I unblocked the range and it is a big anticlimax. Orangeask stops by every few days, picks up robots.txt and the main index.html, and then goes away. Yawn.

Most of the time when a robot doesn't go beyond the front page I don't even care whether they look at robots.txt. Unless I have other reasons for deciding I don't like their face. (What on earth is "index-9x.jsp"? Why would someone* ask for that file and nothing else ... especially after getting index.html, which should have told them it's not a .jsp site?)


* I just looked them up. They live at Hetzner. Don't everyone gasp at once. This has nothing to do with orangeask except that they happened to show up on the same day.