Forum Moderators: open
I posted this in Website Technology Issues a few days ago and got no takers... looking today it appears about 65% of all Google URL's calls in the past week got rejected. Tech's are reasonably confident it is not an IIS problem. Anyone seen anything like this?
While looking at the URLScan logs on some of the webservers where it is deployed we are finding that urls from Google's crawler are getting blocked because they seem to be in a incorrect format. The url requested seems to be
/content/article/2/1700_50030,http://mydomain.com/content/article/2/1700_50030'
Trying to determine if it is a problem on our side or why Google is requesting urls in this format or where they are getting these urls from. No specific help from Google as of yet.
A sample from URLscan log is given below. Basically it shows that raw url is a relative url followed by a comma and the complete url [domain substituted]. The correct request would have resulted in only the relative url in the url scan log Raw url field
[04-30-2003 - 00:05:24] Client at 64.68.80.155: URL contains extension '.com', which is disallowed. Request will be rejected. Site Instance='3', Raw
URL='/content/article/2/1700_50030,http://mydomain.com/content/article/2/1700_50030'
[04-30-2003 - 00:05:37] Client at 64.68.80.149: URL contains extension '.com', which is disallowed. Request will be rejected. Site Instance='3', Raw
URL='/content/article/12/1687_50188,http://mydomain.com/content/article/12/1687_50188'
Thanks for the help!
As far as I've seen, 99.9% of the time Google requests a bogus url, it is something caused by your own config / server / scripting, etc.
Have you tried spidering your site yourself with a freeware type of crawler? Specifically, one that will give you the http status code, eg, 200, 302, 301, 403, 500, etc.
And if you have tried that, how did it work out? The same url's with errors, or not errors at all?