Page is a not externally linkable
claus - 3:30 pm on Mar 9, 2005 (gmt 0)
:) You can't ban 302 referrers as such Why? Because your server will never know that a 302 is used for reaching it. This information is never passed to your server, so you can't instruct your server to react to it. You can't ban a "go.php?someurl" redirect script Why? Because your server will never know that a "go.php?someurl" redirect script is used for reaching it. This information is never passed to your server, so you can't instruct your server to react to it. Even if you could, it would have no effect with Google Why? Because Googlebot does not carry a referrer with it when it spiders, so you don't know where it's been before it visited you. As already mentioned, Googlebot could have seen a link to your page a lot of places, so it can't "just pick one". Visits by Googlebot have no referrers, so you can't tell Googlebot that one link that points to your site is good while another is bad. You CAN ban clickthrough from the page holding the 302 script - but it's no good Yes you can - but this will only hit legitimate traffic, meaning that surfers clicking from the redirect URL will not be able to view your page. It also means that you will have to maintain an ever-increasing list of individual pages linking to your site. For Googlebot (and any other SE spider) those links will still work, as they pass on no referrer. This is what really happens when Gbot meets 302: Here's the full lowdown. First time i post it all. It's extremely simplified to benefit the non-tech readers among us, and hence not 100% accurate in the finer details, but even though i really have tried to keep it simple you may want to read it twice:
The full story of Google and 302s
Fine print: I may want to republish this on my own site later on (usually when i say this i don't even bother), but otherwise it's one of those "you saw it on WebmasterWorld first" posts, so it's not intended for republishing all across the web. Yes, it means: Please don't republish if you didn't write it, which you didn't.
...just clearing up a few misunderstandings first, then you'll get the full lowdown on this stuff.
So, essentially, by doing the right thing (interpret a 302 as per the RFC [w3.org]) Google allows another webmaster to convince it's bot that your website is nothing but a temporary holding place for content.
Further, this leads to creation of pages in the index that are not real pages. And, you can do nothing about it.
[edited by: claus at 3:45 pm (utc) on Mar. 9, 2005]