homepage Welcome to WebmasterWorld Guest from 54.242.200.172
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
another for the profilers
lucy24




msg:4506178
 10:09 pm on Oct 9, 2012 (gmt 0)

Nothing special about the IP: 177.134.201.nn Brazilian range that I haven't met before. Don't get much traffic from Brazil, whether robotic or human.

Nothing special about the UA: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0

It only jumped out at me because it racked up a solid string of 404s-- or rather three sets of equal size, all on the same calendar day:

07:34:51 ... /Disclaimer.aspx
07:34:52 ... /m-paco-rabanne-parfum-118,autres-2788.html
07:34:52 ... /g-promotions-2,task-essential-416.html
07:34:52 ... /p-1-million-vaporisateur-200-ml-paco-rabanne-parfum-2216-68.html
07:34:52 ... /g-promotions-2,10-a-20-1799.html
07:34:52 ... /g-promotions-2,70-et-plus-1805.html
07:34:52 ... /p-ultrared-man-vaporisateur-50-ml-paco-rabanne-parfum-2243-68.html
07:34:53 ... /blog

12:58:45 ... /letrat.htm
12:58:45 ... /peignoir-personnalise.html
12:58:45 ... /activiteiten
12:58:46 ... /m-paco-rabanne-parfum-118,non-1779.html
12:58:46 ... /frais-de-port.html
12:58:46 ... /g-promotions-2,vitaman-424.html
12:58:46 ... /g-nouveautes-1,lancaster-3667.html
12:58:46 ... /scheidsrechters

13:48:49 ... /p-xs-pour-homme-vaporisateur--100-ml-paco-rabanne-parfum-2233-68.html
13:48:49 ... /g-promotions-2,40-a-50-1802.html
13:48:49 ... /letrao.htm
13:48:50 ... /Competitie
13:48:50 ... /letrap.htm
13:48:50 ... /p-deodorant-stick-ultraviolet-man-75-ml-paco-rabanne-parfum-2249-8.html
13:48:50 ... /provincies
13:48:50 ... /beker-van-vlaanderen


Isn't that weird? "Disclaimer.aspx" and "blog" are the kinds of things you would expect a robot to ask for. The ones that use the shotgun method, coming in with a long list of possible vulnerabilities.

The "letrat, letrap, letrao" otoh makes me wonder if it will be back next week to ask for letra[a-nqrsu-z].

That leaves 19 pages that could perfectly well exist-- on some site in Belgium. They're hardly generic names. But it isn't referer spam, because there wasn't one.

What on earth do you suppose they were looking for?

 

incrediBILL




msg:4506193
 10:39 pm on Oct 9, 2012 (gmt 0)

Maybe your site was the target of an SEO hacking and spam bot that got a false positive and the bot came back to see if any of it actually stuck.

OK, now that we've had that no-so-far-fetched theory, perhaps it was simply a bug in a crappy crawler penned in kiddie script that associated the wrong domain name with the wrong pages.

lucy24




msg:4506333
 7:25 am on Oct 10, 2012 (gmt 0)

a crappy crawler penned in kiddie script that associated the wrong domain name with the wrong pages

You may have thought you were kidding but I caved in and looked up some names.

The site exists. In France, darn it, not Belgium. But is name is exactly the same as mine, except that the first letter is different, and the second letter is different, and it's got a different number of syllables, and the overall length (exclusive of www. and .com) is different. Oh, and every single digit of the server IP is a mismatch. So it's a mistake any robot could have made ;)

If they watch their logs as closely as I do, someone in the men's toiletries business is going to be very baffled at getting requests for pages apparently written in Atahualpa.

Wonder what they were looking for? Online credit-card loopholes?

incrediBILL




msg:4506340
 7:56 am on Oct 10, 2012 (gmt 0)

You may have thought you were kidding but I caved in and looked up some names.


Nope. I was deadly serious. I never kid... about crappy code. It simply looked like a mismatched domain and pages. Hope that's really all it is too because figuring it out otherwise could put gray stubble on my bald head.

lucy24




msg:4506688
 12:46 am on Oct 11, 2012 (gmt 0)

And the punchline is...

I, on the other hand, really was kidding about the "letra[a-nqrsu-z]". But in approved Sesame Street fashion, they have since returned for l, y and g. 6 down, 20 to go. Oh, and they picked up a fresh copy of robots.txt. (I snooped. They do not appear to have visited any disallowed directories.)

Wait, it gets better. After a break, they changed IPs-- keeping the same UA-- and did two more sets of eight. You won't fully appreciate this unless you have snooped:

14:13:32 /g-nouveautes-1,anthony-logistics-306.html
14:13:32 /pb/pellicules.html
14:13:33 /category-anmoyugang/
14:13:33 /federaties
14:13:33 /qui-sommes-nous-artex.html
14:13:33 /fun/entretenimientos.htm
14:13:33 /g-promotions-2,20-a-30-1800.html
14:13:33 /fun/agropecuaria.htm


And, when next seen:

14:40:12 /fonts/legacy.html
14:40:13 /fonts/custom_greek_it.html
14:40:13 /hovercraft/april_blues.html
14:40:14 /hovercraft/hovercraft.html
14:40:14 /silence/
14:40:14 /hovercraft/duct_tape.html
14:40:14 /hovercraft/hover_redux.html
14:40:15 /fonts/aujaq.html


Whew. Guess the script got sorted out :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved