homepage Welcome to WebmasterWorld Guest from 54.211.73.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
/3 and /5 requests, strange
roshaoar



 
Msg#: 4661349 posted 10:22 am on Apr 8, 2014 (gmt 0)


Has anyone come across strange web requests like this that don't seem to bear any resemblance to what's on the site?

Requests for <url>/3 and <url>/5

From Princeton and Google - obviously they 404 because I've never had them but does it ring any bells with anyone why this might happen?

Thx


* Wasn't sure if this is the right forum, if not please move?

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4661349 posted 6:41 pm on Apr 8, 2014 (gmt 0)

Hm. Computer-science class with "Create a robot" assignment?

The request from google is a little bit worrying because it implies they've actually seen the link somewhere. Google does ask for nonexistent files-- but always in the form "23lk4jdf9o8tu5.html" where they deliberately ask for a garbage name. It seems to be triggered by an unusual rate of redirects on your site.

Option B is that somebody's robot has got their shopping list mixed up with a different hostname. But Princeton, hm. I assume that's based on the IP, not just some claim in the UA string.

SSID forum maybe?

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661349 posted 6:42 pm on Apr 8, 2014 (gmt 0)

Can you show an example of what this looks like in your logs?

roshaoar



 
Msg#: 4661349 posted 6:59 pm on Apr 8, 2014 (gmt 0)

66.249.68.59 - - [08/Apr/2014:03:39:31 +0100] "GET /(normalpage)/3/ HTTP/1.0" 301 207 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.68.59 - - [08/Apr/2014:03:39:32 +0100] "GET /(normalpage) HTTP/1.0" 301 208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


66.249.68.91 - - [08/Apr/2014:04:30:58 +0100] "GET /(differentnormalpage)/3/ HTTP/1.0" 301 203 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.68.59 - - [08/Apr/2014:04:30:58 +0100] "GET /(differentnormalpage) HTTP/1.0" 301 204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


I googled the IP:

IP Address: 66.249.68.59
whatismyipaddress.com/ip/66.249.68.59?
Location: Mountain View, United States - 66.249.68.59 is a static assigned Corporate IP address allocated to Googlebot. Learn more.

I think the /5/ might be something to do with this:

173.252.112.117 - - [08/Apr/2014:06:06:37 +0100] "GET /5/que-es-el-cine-de-genero/?utm_source=dlvr.it&utm_medium=twitterrce%3Dother_multiline&action_object_map=%257B%2522752450914769280%2522%253A547777391969572%257D&action_type_map=%257B%2522752450914769280%2522%253A%2522og.recommends%2522%257D&action_ref_map=%255B%255Dber_level_req%3D1&fb_locale=de_DE HTTP/1.0" 404 5753 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"


Fwiw, I've been seeing a weird (looks like hacked) wordpress cricket site do REALLY weird links to my site with pages that don't exist, maybe google got it from there. I disavowed it as it's nothing to do with me... but weird stuff.

Lucy you say commonly done on sites with redirects, maybe it's that? I redirect to have a trailing slash and to make directory/index.php just be directory/, and canonical www to non www ... took them off just in case except the latter. Kind of annoying as it's my personal site which is slightly in the "webby" spotlight as of today

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4661349 posted 8:56 pm on Apr 8, 2014 (gmt 0)

The google requests are definitely legitimate. What about the Princeton ones?

A routine redirect like trailing-slash or "index.html" shouldn't trigger any unusual googlebot requests, because everyone has those. I notice the garbage requests when I've made changes resulting in an unusually high proportion of redirects. They deliberately ask for something that's extremely unlikely to exist, just to verify that the site is still returning 404s. I have to assume that this is fully automated, and that it's triggered by proportions rather than absolute numbers.

Unlike some problems, this one is probably safe to dump in the "ignore them and they'll go away" bin. If you're getting a whole lot of bum requests from the same IP, you might look them up and see if they deserve a general block on grounds of underlying robotitude. But really, a 404 is as effective as anything. Some robots get all excited over 403s because they think it means you're hiding something from them.

roshaoar



 
Msg#: 4661349 posted 10:43 pm on Apr 8, 2014 (gmt 0)

Thanks Lucy... again!

Maybe it was 403 related - I had some .htaccess sending "badPeople" to 403 land. Took that off a couple of days ago. argh.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661349 posted 1:09 am on Apr 9, 2014 (gmt 0)

404? The paired examples were 301s... what are they redirecting to? (the filesize of the redirects are 207, 208, 203, 204, all very small sizes.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4661349 posted 2:20 am on Apr 9, 2014 (gmt 0)

what are they redirecting to? (the filesize of the redirects are 207, 208, 203, 204, all very small sizes.

But that's just the size of the 301 response header, not the page itself. Exact header size apparently depends on your server; the "location" element itself will only vary by a few bytes.

If your ErrorDocument directive is incorrectly worded, everything turns into a 302. But not a 301; those only happen on purpose.

roshaoar



 
Msg#: 4661349 posted 6:33 am on Apr 9, 2014 (gmt 0)

Yes - they were 301s because I redirect everything that doesn't have a page ending to have a trailing slash

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved