Verifying in sitemap

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Verifying in sitemap

joker197cinque

6:59 am on Apr 7, 2006 (gmt 0)

I'm trying to "verify a site" in google sitemap for a site of mine.

Some of them are ok, one of them always returns this error:

"We've detected that your 404 (file not found) error page returns a status of 200 (OK) in the header."

"Temporarily problem".

When I try to browse with Firefox the url:

[mydomain.com...]

it works fine.

And, also, when I read my web server logs I notice that ip:

64.233.172.37 try to GET the file but with NO USER_AGENT!

Same ip, try then to GET noexist_440e5c4f7d12c2fa.html obtaining 302 from IIS.

Now...few questions..

1) What kind of behavior is it? A GET request should always have a valid user agent...shouldn' it?

2) How G can think to find noexist_440e5c4f7d12c5fa.html...? Did it give a try to "write" it on my own web server?

Any help much appreciated.

Best regards.

tedster

7:15 am on Apr 7, 2006 (gmt 0)

It's not a question of whether the error page is displayed properly in a browser -- it's a question of what kind of server header is returned if a non-existent url is requested. A url is not a 404 unless the server header actually returns a "404" http header code -- and a 302 temporary redirect is definitely not a 404.

Since you're already using Firefox, you can download the "live http headers" extension and view the http headers yourself while you work on getting this issue sorted out.

The sitemap program did you a big favor here -- one of the biggest sources of Google indexing problems I see when people bring me problem sites stem from "custom error pages" that do not return a 404 header. This means that ANY garbage url will be indexed as the content that your 302 points to. As the bad page requests pile up, so does your "duplicate content". At some point, you just breal Google's willingness to index your urls any more. The good ones are overwhelmed by the bad ones.

And on your second question, there's no requirement for a "valid" user agent string to be sent. In my opinion, this is a wise thing the Sitemaps team is doing -- because some sites will give a different result to the conventional googlebot user agents or IP addresses (cloaked content) than what they serve to a recognized browser.

Again, if you are going to be in the Google Sitemaps program, you would only be asking for trouble if you did something like that. I am not currently using the program, but this actually impresses me.

joker197cinque

12:09 pm on Apr 7, 2006 (gmt 0)

Since you're already using Firefox, you can download the "live http headers" extension and view the http headers yourself while you work on getting this issue sorted out.

I tried it...cool!

My custom 404 page simply redirect to my home page...I didn't think this was so bad...:-((

Now I wrote an html advice page for 404.

And Google has verified the page.

one of the biggest sources of Google indexing problems I see when people bring me problem sites stem from "custom error pages" that do not return a 404 header. This means that ANY garbage url will be indexed as the content that your 302 points to.

I understand, thank you...do you think that removing this behavior should improve my rank/whatever else?

And on your second question, there's no requirement for a "valid" user agent string to be sent. In my opinion, this is a wise thing the Sitemaps team is doing -- because some sites will give a different result to the conventional googlebot user agents or IP addresses (cloaked content) than what they serve to a recognized browser.

I agree with you when you talk about cloaking but for google is so simple to impersonate IE/Opera/Firefox and so on to compare result...this is not my case...I only would like serve some content in case of 404...:)

Actually, what is the best behavior for 404?

Regards.