Forum Moderators: open

Message Too Old, No Replies

/trackback/

         

lucy24

5:42 pm on Jun 18, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can someone point to an explanation of
-- what is /trackback/
and
-- is it ever legitimate?

In logs it comes through like this:
POST /real-directory/real-page.html/trackback/
which gets them an automatic 403 because of the POST. Depending on the robot involved, the referer is either /real-directory/real-page.html or it's an autoreferer ending in /trackback/.

Now, I don't know why the latest one caught my notice, but further search through old logs revealed that some trackback requests are immediately preceded by a robotic request for /real-directory/real-page.html --whatever page is named in the trackback-- except that they never make it to the page, because the UA is always some variation on
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )
(note final space) where the closing ) is sometimes preceded by assorted "NET CLR" business, retaining the spurious space before the optional bits. This gets them a redirect to the old-browsers page-- which they always follow, although not otherwise humanoid.

Sometimes the same /trackback/ request recurs a few days in a row, but beyond that it's random. The IP for the /realpage part may be the same as the /trackback/ request-- or it may be entirely different. As far as I can tell, a /trackback/ has never followed a blocked request-- but thanks to that elderly UA, it has also never followed a successful (not redirected) page request.

So:
-- what are they really doing
and
-- what are they pretending to do?

Hobbs

10:54 pm on Jun 18, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wiki says: A trackback is one of four types of linkback methods for website authors to request notification when somebody links to one of their documents ... pingbacks

[en.wikipedia.org...]

seeing some here too, mostly CN & TW ips, same UA & Post
my guess is it's a scan or prelude to a reaping or log spam campaign..

aristotle

12:24 am on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



originally it was an automated followup that occurred whenever a blogger somewhere put a link to one of your pages in one of his or her blog posts. It checks to see if your page really exists, and then it asks for a trackback reciprical link from your site back to the original blog. But your site has to have the appropriate trackback scripts in order to create the reciprical link. I've seen hundreds of these requests in my logs over the years, but my sites don't have the scripts to even recognize what's happening and so always give a 404 response.

But if someone is seeing these coming from unlikely foreign countries, or other unlikely places, then that could be something similar to referal spam.

keyplyr

10:31 am on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is also used by forum, discussion boards & blog software to keep tally on outgoing visitor traffic and where they go.

Since I feel the destination (not referrer) of the user is a personal privacy issue, I've blocked trackback, pingback and the several other tracking attributes for years without any negative results that I'm aware of.

aristotle

12:43 pm on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My first post in this thread might not have been fully accurate. What I should have said is that your site has to have some form of special trackback software in order for it to interact with these requests. Many blog platforms have this software as part of their package, but most ordinary websites don't have it and therefore can't interact with it or respond to it. keyplyr may be right that some forums also have it, but I think it was originally developed for bloggers to reciprocate each others' links.

lucy24

6:04 pm on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't have a separate rule for /trackback/ because it's covered by a global block on POST for any URL that doesn't have a POST function (I think only the contact page currently). But you can see where it would seem pretty spurious to be requesting a trackback for a page you have never been allowed to see. I seriously doubt any of them involve actual links, since at least some of those would be mentioned in wmt. (I know from earlier spot-checking that the "links to your site" area includes nofollow links, so the information should be there regardless.) I don't think I've seen any /trackback/ requests for pages that I know are mentioned in other people's blogs.

Amusing follow-up: Until I saw this post "live" I didn't realize the UA string that I quoted involved full duplication. It jumped out and hit me in the face because my current window width results in the two halves displaying on consecutive lines, as if I'd accidentally pasted it in twice. In fact I had to go back and re-check to make sure this wasn't what I'd done. (MSIE being MSIE, I would not be surprised to learn there exist legitimate UA strings with just this kind of duplication.)

Question I forgot to ask the first time around: In the case of a legitimate /trackback/ what kind of UA would you expect to see?

keyplyr

7:29 pm on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24 - URLs contain trackback use the GET & PUT (as well as POST) methods also.They can be in the requested URL or the referring URL.

Again, software adds this to the path when a user follows a normal link at the forum or blog.

lucy24

9:07 pm on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



URLs contain trackback use the GET & PUT (as well as POST) methods also.

Yeah, wouldn't a legitimate request start with GET to confirm that the URL exists at all? I kinda think PUT requests are blocked by my host; the ones that don't get a 403 come through in logs as 405 ("Method not allowed").

:: detour to re-check ::

Oh, look, a whole nother robotic behavior I never suspected, complete with a whole nother iffy UA string:
188.143.232.153 - - [02/Aug/2014:05:31:44 -0700] "GET /wp-trackback.php HTTP/1.1" 403 3320 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)" 
188.143.232.153 - - [02/Aug/2014:05:31:44 -0700] "GET /blog/wp-trackback.php HTTP/1.1" 403 3320 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
188.143.232.153 - - [02/Aug/2014:05:31:45 -0700] "GET /news/wp-trackback.php HTTP/1.1" 403 3320 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
188.143.232.153 - - [02/Aug/2014:05:31:45 -0700] "GET /wp/wp-trackback.php HTTP/1.1" 403 3320 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
<snip>
85.25.100.162 - - [15/Aug/2012:14:22:37 -0700] "GET /wp-trackback.php HTTP/1.1" 301 482 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
85.25.100.162 - - [15/Aug/2012:14:22:38 -0700] "GET /wp-trackback.php HTTP/1.1" 403 1423 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
85.25.100.162 - - [15/Aug/2012:14:22:38 -0700] "GET /blog/wp-trackback.php HTTP/1.1" 301 492 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
85.25.100.162 - - [15/Aug/2012:14:22:39 -0700] "GET /blog/wp-trackback.php HTTP/1.1" 403 1423 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1); .NET CLR 3.5.30729)"
<snip>
(I do not care to speculate what I was doing in August 2012 that permitted a whole string of 301-to-403 sequences; whatever it was, I'm no longer doing it.)

188.143.232.153, whoever they are, seem to have been a repeat offender. Further detour to IP records says they're a Russian ISP that's so dirty, I've never bothered to switch them from flat 403 to env=bad_russia.

:: idly thinking that it would be great fun to invent pages that replicate those popular WP URLs, and make the page do something nasty ::

keyplyr

1:41 am on Jun 20, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




:: idly thinking that it would be great fun to invent pages that replicate those popular WP URLs, and make the page do something nasty :
I used to forward them (and others) to some raunchy gay torture porn site, but the hits started to increase.

keyplyr

9:13 am on Jun 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got hit tonight with 23 POST requests for one of my pages with trackback appended, all with different UAs & from many compromised (I assume) ISPs/servers worldwide, so a small botnet. The purpose for this failed (403s) effort remain a mystery.

lucy24

4:21 pm on Jun 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Up above, aristotle suggested that it might be a version of referer spam: they're hoping to insert links to their site from yours. (Uh... I did understand that right didn't I?) And since WP and similar are so ubiquitous, they proceed directly to /trackback/ rather than first investigating to see whether you are, in fact, a WP site. It's much the same as when robots ask for every possible variation of /wp-admin/ without first reading the front page's HTML for solid evidence of WP-ness. (Heck, there exist crawlers whose sole purpose is to see what CMS you're using. I believe I recently blocked one. A single visit to front page is ignored unless there are complicating factors; repeated requests can only cause annoyance.)