Forum Moderators: DixonJones

Message Too Old, No Replies

Requests for URLs with spaces

GET /urls/with/ spaces/in/the/middle/

         

ams_david

7:28 am on Oct 3, 2003 (gmt 0)

10+ Year Member



I've noticed something new lately: we're getting a few hits a day for URLs with spaces in them. These all end up as 404's, as the real URL for the pages have no spaces.

For example, the real location might be:
/urls/with/spaces/in/the/middle/

But the request is always like:
/urls/with/ spaces/in/the/middle/

The requests always use the site IP, not hostname, and include no referrer or browser strings.

Since google recently changed their listings to add spaces to long display urls, I'm thinking it's related to that.

Anyone else seeing this?

pageoneresults

7:39 pm on Oct 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since google recently changed their listings to add spaces to long display urls, I'm thinking it's related to that.

Hmmm, Google has always added spaces to break up longer URIs so I'm not too certain that would be the place to look.

ams_david

8:03 pm on Oct 3, 2003 (gmt 0)

10+ Year Member



Google has always added spaces to break up longer URIs

Hm... I remember a short while ago there were cases were you'd get a big empty spot in google results, caused by a long url forcing content to go under the adwords block.

SBAmerica

3:58 pm on Oct 4, 2003 (gmt 0)

10+ Year Member



Do they look anything like this?

2003-10-04 00:06:10 68.123.124.178 - W3SVC383 GUAN ourdomainiphere 80 GET / ~/ 404 123 4203 42 0 HTTP/1.1 ourdomainiphere - - -

We've been getting hundreds of hits/day with these 404's, that then get 'redirected' to a 200 by the server. No referrer, but Google lists many of our pages with a leading space after the .com . For example www.ourdomain.com /our-page-here.html . Do you see a similarity here?

ams_david

11:05 am on Oct 10, 2003 (gmt 0)

10+ Year Member



Something similar. White space breaks after .com, and more so after a / character, like:

x.our.server.ip - 128.242.client.ip - - [10/Oct/2003:03:54:23 +0200] "GET /cgi-bin/ widget.cgi?widget_one=widgetry&widget_two=fun+with+widgets HTTP/1.0" 400 370 "-" "-"

The fun part is we have a virtual hosting environment with one IP for several web sites. The requests generating errors not only have white spaces, but are also GET'ing via the direct IP#, and not the hostname.

We get a steady trickle of these types of requests...