Forum Moderators: open

Message Too Old, No Replies

X-Fowarded-For

[ sic ]

         

lucy24

10:54 pm on Feb 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For those who prefer the shotgun method of blocking:
IP: aa.bb.cc.dd
Connection: close
X-Fowarded-For: aa.bb.cc.dd
User-Agent: compatible;Baiduspider/2.0; +http://www.baidu.com/search/spider.html
Host: example.com
note spelling. Unlike "referer", that is not how the word is customarily spelled. Or, if you prefer, spelt.

Both IPs vary-- for a while they were coming from DataShack-- but they always claim to be the Baiduspider. No, I don't know what they are compatible with. Maybe it's an adjective in its own right, like "compatible color".

Cropped up while I was searching logs for the real Baiduspider. Further searching reveals that the real one looks like this:
IP: aa.bb.cc.dd
Accept: */*
Accept-Language: zh-cn,zh-tw
Accept-Encoding: gzip
User-Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Connection: close
Host: example.com
but the fake one is currenly more common.
....
Do you know, it has only just occurred to me that + (plus sign) = space in some contexts. Is that why some UAs have double spaces in them?

keyplyr

7:58 am on Feb 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Fee...Fi... Fowarded

I see several fake Baiduspiders a day. One comes from Chinanet and has spoofed me long time.

AFAIK the + in UA strings is just used to kill the link. Not aware of it used as a space anywhere. In some programing contexts it connects two statements.

Any other indications the header is a fake?

lucy24

7:01 pm on Feb 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You mean, other than the non-APNIC IP?

:: detour for closer look, because there's a heck of a lot of them, and I only keep headers for 6 months or so ::

Weirdly, they seem to favor the art studio's site, which gets next to zero traffic. (It's one of those sites where, well, everyone's got a website so we'll put one there too.) And I only recently thought of logging the requested URI along with headers so I don't have to cross-check against access logs.

They seem to be partial to URLs with "fck" in them somewhere (is this a WP thing?)* which should be an automatic block. Or an automatic manual 404. Definitely means malign robot rather than real search engine, since they're cold-requesting files that (a) don't exist and (b) wouldn't be indexable anyway.

IP (the non-"fowarded" one) is variously China, Japan, and a long run of DataShack. The "fowarded" IP never seems to be the same one twice. Does tht mean they were working off infected mchines using assorted places as proxies until it got to be too much for even their chosen server farms? If so, the DoD is in trouble, because I found an "X-Fowarded-For" 28.blahblah.

Wait, scratch that:
X-Fowarded-For: 252.26.242.157

:: quick detour to free lookup to confirm that they haven't opened up this final /3 when I wasn't looking ::

What on earth is the sense in sending a fake header with that level of bogusness?!

Not aware of it used as a space anywhere.

When a multi-word search string is turned into a ? parameter, all the wordspaces are turned into + signs.


* The first time I saw this string in an URL, it happened to be a non-English-language site and I thought they had just made an unfortunate choice.

keyplyr

10:18 pm on Feb 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, that + sign useage:)

blend27

1:26 pm on Feb 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They seem to be partial to URLs with "fck" in them somewhere (is this a WP thing?)* which should be an automatic block.

Most likely FCKEditor, which is now CKEditor, which a great way to collect scanning IPs when it comes the times you have one installed but in a none standard location or don't have it installed at all.

lucy24

5:58 pm on Feb 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a great way to collect scanning IPs

I serve manual 404s to any request containing the strings "wp" or "admin", and then toddle around later to make sure the originating IP is duly blocked. Been meaning to add "fck" [NC] to the list.

:: memo to self: find crossword-puzzle dictionary and ensure that the letter sequence "wp", as in "strawpatch", does not occur in any word I am ever likely to use in an URL ::

keyplyr

9:40 pm on Feb 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"edit" is also helpful to block. Covers the above, as well as some utilities used for scraping stuff off your site.

lucy24

11:20 pm on Feb 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In a similar vein:
Useragent:
Most of the time it's followed by a correctly spelled User-Agent header-- always specifying a different UA-- such as
Useragent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
User-Agent: Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)
but sometimes it isn't. Which is fine, because then the request reads as "no user-agent" and is blocked forthwith. And most of the duplicates are Chinese robots who are handily blocked on other grounds.

But the mere existence of "Useragent:" like that is probably grounds to suspect hanky-panky.

I also found a few "Accept-Charset: SO-8859-1" but probably not enough to pay attention to. I think it was all the same robot.