Page is a not externally linkable
- Google
-- Google News Archive
---- Meta Refresh leads to ...


Robert_Charlton - 8:58 pm on Mar 20, 2004 (gmt 0)


where do you write to GoogleGuy?

I tried whatever method he was suggesting at the time to reach him with spam reports, flagging the report with "GoogleGuy" and "WebmasterWorld" and my WW username. For various feedback now, I think he's suggesting doing this via webmaster@google.com. Probably the best time to reach him is really late Friday night. ;)

I'm not even sure the message actually reached him. This is one of the dilemmas about this problem. No confirmation about anything. The helpful but obviously beleagured Google search engineer at SES gave me his card and email address, but also had to suggest I flag the message to refer to our conversation at SES so it wouldn't get filtered out. Who knows whether it got through?

I think Google is trying very hard and is much better than most about responding to feedback, but this topic has fallen through the cracks.

I had a thought from the other side of this. I use redirectional scripts on my site for banner adverts at the top of pages. These use a php re-direct from a script in a robots.txt denied folder.

Something similar was suggested in one of the threads I mentioned above. Whether this works may also depend on whether a meta refresh redirect is used as a "backup." Some redirectional scripts do this in case the scripts don't suffice. I'm definitely not an expert in this area.

Also, from what I understand, robots.txt won't prevent Google from indexing a url it hasn't crawled. It will index a link but not the page content, as in the threads below:

Problem with Googlebot and robots.txt?
Google indexing links to blocked urls even though it's not following them
[webmasterworld.com...]

Comment from GoogleGuy:
If we have evidence that a page is good, we can return that reference even though we haven't crawled the page.

And check out the thread Jim Morgan references (and read his excellent msg#12) now moved to:

Question about simple robots.txt file
[webmasterworld.com...]

I'm not sure how the robots.txt and Google indexing of links relates to our problem at hand, but the above threads suggest that robots.txt alone might not suffice.

In my conversation with the Google engineer, I'd mentioned that I'd requested the redirecting directory to put the noindex,nofollow robots meta tag on the redirect page in an attempt to get it out of the Google index. The engineer said he thought that this should take care of it. It didn't, or it took quite a few months, and the problem came back again before it finally disappeared.

In retrospect, I'm not sure why things eventually returned to normal, whether through Google's intervention, or because of the noindex,nofollow tag on the redirecting page. Anyway, I'd do both. This may not fix the problem, though... and the big problem is that we're vulnerable to how other domains link to us.

why doesn't Google (and other robots) simply ignore pages with meta refresh's - just ignore the url where the meta refresh is hosted ... dividing content and url and REPLACING the target url is - in my eyes - a serious bug!

Initially, this makes sense, but I don't think it's that simple. I think there might be some altruistic motives in Google's approach... ie, a lot of sites are built badly, with splash pages or landing pages that redirect, and, by handling meta refreshes the way it does, Google keeps a lot of these sites in the index. My guess is that ignoring meta refresh pages would create a more widespread upset than Florida, but I don't have the statistics... just a hunch.

It might suffice if Google kept things the way they were as long as it was the same domain, but ignored redirects that jumped domains. Even here, a lot of unwitting sites would get hurt. I can't tell you how many sites I've had to fix that used meta refreshes to redirect .net, .com, etc, as well as various ppc landing pages to a main "home" page that was not the default index page of the domain.

It may be that, while this is a stupid arrangement, it is no less legit than the meta refreshes on the directory counter pages and that the engines have a hard choice to make. I don't know. It would be nice to get some evidence of attention.


Thread source:: http://www.webmasterworld.com/google_archive/22754.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com