9:11 am on Mar 4, 2012 (gmt 0)

Senior Member from US 

lucy24

joined:Apr 9, 2011
votes: 244

This was originally going to go in Foo as a humorous "Neener-neener, you'll never top this!" But then it got weird.

Background: I get strange search strings. Don't everyone yawn at once. Generally 2-3 words. Sometimes more. Sometimes much more. Until recently, my record was
melissa's baby goes missing. melissa finds that an alligator has taken the child. melissa asks the alligator to give the child back. the alligator tells her that he will return the child if she answers a question correctly, but he will eat the child if she answers incorrectly. his question is thus: will i eat your baby? melissa replies, "yes." what will the alligator do?

The seeker must have been scraping the barrel, because the page they landed on has not one word about alligators, and only one occurrence of the word "baby". Lots of Melissas, though. Incidentally, I don't know the answer to the puzzle. But I got a whole flurry of them for a month or two.

At 60-plus words that would seem to be the absolute limit. And then towards the end of February I got this one. (Moderators, this is from a public-domain e-text that exists in many many identical copies all over the web.) Verbatim:
There had been a great deal of moving about in the warehouse during the day, running of trucks, and rolling of casks. Brisk, the liveliest of my brothers, had sat watching in a hole from noon until dusk, and now hurried through our little passage into the shed, where we were all nestling behind some old canvas. He brought us news of a coming feast.
‘A ship has arrived from India,’ said he, ‘and we’ll have a glance at the cargo. They’ve been busy stowing it away next door. There’s rice–’
The brotherhood of rats whisked their tails for joy.
There was a universal squeak of approbation.
‘That’s nothing but a blue dye obtained from a plant,’ observed Furry, an old, blind rat, who in his days had travelled far, and seen much of the world, and had reflected upon what he had viewed far more than is common with a rat. Indeed, he passed amongst us for a philosopher, and I had learned not a little from his experience; for he delighted in talking over his travels, and, but for a little testiness of temper, would have been a very agreeable companion. He very frequently joined our party; indeed, his infirmities obliged him to do so, as he could not have lived without assistance.

Naturally I went off to replicate the search. The original came from hk.search.yahoo.com, but I stuck with g###. Two interesting things came up:
#1 The original search string used en dashes and single quotes-- and a tab at each paragraph break. All copies of the e-text that I could find use em dashes and double quotes, and paragraphs aren't indented. (The book is British but predates the typographic change from double to single quotation marks.) Hardly surprising, since they all come from the same source.
#2 The initial search brought up seven identical texts, along with the google blurb about how they've omitted some very similar results. When I asked them to repeat, including the omitted results, they went up to 28 before giving me the same "omitted results" blurb. But they flatly refused to show me any more, in spite of the "repeat this search" option.

Fast-forward a day or two. Another search from the rarely-seen Hong Kong yahoo, this time asking for (in full)
There had been a great deal of moving about in the warehouse during the day, running of trucks

At this point I begin to smell a rat, and go back over the month. About two weeks earlier, Warehouse Guy-- identical IP and UA-- searched for
the liveliest of my brothers had sat watching in a hole

Couple days before that, someone else asked google to find
there had been a great deal of moving about in the warehouse during the day

But don't be fooled by the dot com. This one, too, turned out to be in Hong Kong.

Aside: What's with Hong Kong anyway? I could swear I read that it was due to be sucked back into China somewhere around 1998, but here it is, large as life. Wonder if the Chung King Mansions are still standing?

Couple days later, a fourth person hits yahoo for
Brisk, the liveliest of my brothers, had sat watching in a hole from noon until dusk

Are we getting the pattern here? To date I've had nine different people, all from Hong Kong, most using yahoo. For a text that normally gets one or two legitimate human visitors a month. Any one of them in isolation would be unequivocally human: Assorted up-to-the-minute browsers, all subsidiary files loaded (css, images, favicon), duly logged in piwik. Two are repeaters who seem to have saved the text to their HD, because I'll get piwik hits without a text download.

I got exasperated and blocked access to this file for anyone with "hk" in the referer. This led to:

219.77.30.nn - - [01/Mar/2012:09:46:20 -0800] "GET /ebooks/rambles/Rambles.html HTTP/1.1" 403 1044 "http://hk.search.yahoo.com/ {snip, snip} There had been a great deal of moving about in the warehouse during the day, {snip}" "Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
219.77.30.nn - - [01/Mar/2012:09:46:21 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 1176 "http://www.example.com/ebooks/rambles/Rambles.html" {UA snipped}
219.77.30.nn - - [01/Mar/2012:09:46:21 -0800] "GET /favicon.ico HTTP/1.1" 200 597 "-" {UA snipped}

In other words: this is a human getting served a 403. Custom 403 page with style sheet and favicon.

Ten seconds to assimilate the "Nothing to see here" and to look over the links on the 403 page before taking the likeliest guess:
219.77.30.nn - - [01/Mar/2012:09:46:31 -0800] "GET /rats/ HTTP/1.1" 200 1251 "http://www.example.com/ebooks/rambles/Rambles.html" {snip}
{plus all subsidiary files}

A further eight seconds to decide that this isn't the right place after all, so back to yahoo we go:
219.77.30.nn - - [01/Mar/2012:09:46:39 -0800] "GET /ebooks/rambles/Rambles.html HTTP/1.1" 403 1044 "http://hk.search.yahoo.com/ {et cetera using the identical search string}

OK, let's try somewhere else:
219.77.30.nn - - [01/Mar/2012:09:47:55 -0800] "GET /ebooks/ HTTP/1.1" 200 6481 "http://www.example.com/ebooks/rambles/Rambles.html" {snip}
{plus all subsidiary files}

Now that we no longer have the lethal "hk" in the referer, this leads us safely to:
219.77.30.nn - - [01/Mar/2012:09:48:03 -0800] "GET /ebooks/rambles/Rambles.html HTTP/1.1" 200 77449 "http://www.example.com/ebooks/" {snip}
{plus all subsidiary files}

In other words, exactly the behavior you would expect of a human who was trying to get to an e-book but was getting a bum steer from the search engine.

... Except for those verbatim search strings, all drawing on the same passage. And the fact that every single one of them is from Hong Kong.

If I were on speaking terms with the webmaster of {one of those 28 "very similar" sites} I would ask if he's getting any unusual traffic from Hong Kong lately.
2:18 pm on Mar 4, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 24, 2002
votes: 0

I blocked the whole HK - range quite a while ago also because of bot and not human like behaviour

PS : I haven't seen them since
4:12 pm on Mar 4, 2012 (gmt 0)

Senior Member from FR 

leosghost

joined:Feb 15, 2004
votes: 230

At this point I begin to smell a rat,

;)) I saw what you did there ..