Welcome to WebmasterWorld Guest from 54.196.233.208

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

/3 and /5 requests, strange

     
10:22 am on Apr 8, 2014 (gmt 0)

Junior Member

joined:Jan 23, 2014
posts:131
votes: 0



Has anyone come across strange web requests like this that don't seem to bear any resemblance to what's on the site?

Requests for <url>/3 and <url>/5

From Princeton and Google - obviously they 404 because I've never had them but does it ring any bells with anyone why this might happen?

Thx


* Wasn't sure if this is the right forum, if not please move?
6:41 pm on Apr 8, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


Hm. Computer-science class with "Create a robot" assignment?

The request from google is a little bit worrying because it implies they've actually seen the link somewhere. Google does ask for nonexistent files-- but always in the form "23lk4jdf9o8tu5.html" where they deliberately ask for a garbage name. It seems to be triggered by an unusual rate of redirects on your site.

Option B is that somebody's robot has got their shopping list mixed up with a different hostname. But Princeton, hm. I assume that's based on the IP, not just some claim in the UA string.

SSID forum maybe?
6:42 pm on Apr 8, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:2683
votes: 94


Can you show an example of what this looks like in your logs?
6:59 pm on Apr 8, 2014 (gmt 0)

Junior Member

joined:Jan 23, 2014
posts:131
votes: 0


66.249.68.59 - - [08/Apr/2014:03:39:31 +0100] "GET /(normalpage)/3/ HTTP/1.0" 301 207 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.68.59 - - [08/Apr/2014:03:39:32 +0100] "GET /(normalpage) HTTP/1.0" 301 208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


66.249.68.91 - - [08/Apr/2014:04:30:58 +0100] "GET /(differentnormalpage)/3/ HTTP/1.0" 301 203 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.68.59 - - [08/Apr/2014:04:30:58 +0100] "GET /(differentnormalpage) HTTP/1.0" 301 204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


I googled the IP:

IP Address: 66.249.68.59
whatismyipaddress.com/ip/66.249.68.59?
Location: Mountain View, United States - 66.249.68.59 is a static assigned Corporate IP address allocated to Googlebot. Learn more.

I think the /5/ might be something to do with this:

173.252.112.117 - - [08/Apr/2014:06:06:37 +0100] "GET /5/que-es-el-cine-de-genero/?utm_source=dlvr.it&utm_medium=twitterrce%3Dother_multiline&action_object_map=%257B%2522752450914769280%2522%253A547777391969572%257D&action_type_map=%257B%2522752450914769280%2522%253A%2522og.recommends%2522%257D&action_ref_map=%255B%255Dber_level_req%3D1&fb_locale=de_DE HTTP/1.0" 404 5753 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"


Fwiw, I've been seeing a weird (looks like hacked) wordpress cricket site do REALLY weird links to my site with pages that don't exist, maybe google got it from there. I disavowed it as it's nothing to do with me... but weird stuff.

Lucy you say commonly done on sites with redirects, maybe it's that? I redirect to have a trailing slash and to make directory/index.php just be directory/, and canonical www to non www ... took them off just in case except the latter. Kind of annoying as it's my personal site which is slightly in the "webby" spotlight as of today
8:56 pm on Apr 8, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


The google requests are definitely legitimate. What about the Princeton ones?

A routine redirect like trailing-slash or "index.html" shouldn't trigger any unusual googlebot requests, because everyone has those. I notice the garbage requests when I've made changes resulting in an unusually high proportion of redirects. They deliberately ask for something that's extremely unlikely to exist, just to verify that the site is still returning 404s. I have to assume that this is fully automated, and that it's triggered by proportions rather than absolute numbers.

Unlike some problems, this one is probably safe to dump in the "ignore them and they'll go away" bin. If you're getting a whole lot of bum requests from the same IP, you might look them up and see if they deserve a general block on grounds of underlying robotitude. But really, a 404 is as effective as anything. Some robots get all excited over 403s because they think it means you're hiding something from them.
10:43 pm on Apr 8, 2014 (gmt 0)

Junior Member

joined:Jan 23, 2014
posts:131
votes: 0


Thanks Lucy... again!

Maybe it was 403 related - I had some .htaccess sending "badPeople" to 403 land. Took that off a couple of days ago. argh.
1:09 am on Apr 9, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6142
votes: 281


404? The paired examples were 301s... what are they redirecting to? (the filesize of the redirects are 207, 208, 203, 204, all very small sizes.
2:20 am on Apr 9, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


what are they redirecting to? (the filesize of the redirects are 207, 208, 203, 204, all very small sizes.

But that's just the size of the 301 response header, not the page itself. Exact header size apparently depends on your server; the "location" element itself will only vary by a few bytes.

If your ErrorDocument directive is incorrectly worded, everything turns into a 302. But not a 301; those only happen on purpose.
6:33 am on Apr 9, 2014 (gmt 0)

Junior Member

joined:Jan 23, 2014
posts:131
votes: 0


Yes - they were 301s because I redirect everything that doesn't have a page ending to have a trailing slash