Forum Moderators: Robert Charlton & goodroi
then upload to the removal tools
If you are going to use the removal tool and not just wait for Google to sort things out, then it looks like you need to be sure that the url resolves to a 404.
...use our automatic URL removal system. We'll accept your removal request only if the page returns a true 404 error via the http headers. Please ensure that you return a true 404 error even if you choose to display a more user-friendly body of the HTML page for your visitors. It won't help to return a page that says "File Not Found" if the http headers still return a status code of 200, or normal.[google.com...]
URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
DISALLOW /*?full=0$
But in google guidelines it says like this DISALLOW /*?full=0$
Its a real pain its not even a real page I have NO idea how they have got that link with the?full=0 at the end, why is all those wierd problems ALWAYS on google
I found the safest way to delete more than a dozen of URLs was to generate and ftp dummy pages with simply a meta noindex tag in the head-section. In some cases this is much easier than to use google's delete-url-form.
I am not really sure, but I have the suspicion that, if you forbid such a page by robots.txt, the spider simply doesn't visit, but the page is still kept in the cache.
As I mentioned elsewhere, the same holds true for 404-pages. these are kept in the index and cache far too long.
I suppose thery're trying to protect people from make really egregious errors with the wildcards ;)
I ended up writing out a whole lot of disallow directives in my robots.txt, and then after Google had successfully removed the pages I added the wildcards back in to keep the Big "G" out for good.
It works :)
Damn I thinks its incredible, they have no space on the server and a lot of sites have lost pages, but they do index pages that in a way dont exist, well thats another topic.
I realy dont know what to do now, I see about 500 pages with this extra url as supplemental.
You guys crack me up :)
As far as the links go, I'd spend some time figuring out how to stop your site from creating those other URLs in the first place, or how to use .htaccess to redirect them all to the correct URLs.
If you can do a redirect via htaccess you should then be able to to do the removal.
Good luck!
About server space, ohh thats no joke, they have even themself said so, also when you think about there problem, omitted results, supplemental results and you can not see all your own pages over 1000 pages, thats all to save space.
Sounds like a major pain in the butt...
I've heard the rumor that Google doesn't have enough server space, but I personally think it's the silliest thing I've ever heard.
Anyways, if your browser is still reaching those URLs I think you're still in trouble.
yes, maybe it is difficult to design a dummy-page for such get-variabled-pages. However, what first comes to ones mind, is, of course, that somewhere in your scripts you in fact DO generate links of that kind.
If you are 100% sure the links come from outside, you might add some php-code on top of your pages, provided they run through the parser anyway.
if (isset($HTTP_GET_VARS['full'])){
echo '<html><head><meta name = "robots" CONTENT="noindex,nocache"></head><body></body>';}
else
{
your normal page
}
But what strikes me, is, that in the example given the page itself has no dot-ending like .html or .php
If you're writing pure html, you might even leave the last two brackets and add your original code there, but then the page would literally be ill-formed according to wc-standards, because I assume you'd add a second head-section. But since you're only targetting at getting it off the index, that shouldn't really matter.
However, I would never perform any experiments on my own website with other people's code, which I myself did not fully understand. I guess the same holds true for you. If you know a trustworthy person with some knowledge in php, the whole thing shouldn't be too complicated.
Your first choice, of course, should be the google-url-removal tool. Only if you have hundreds of pages to be removed from the index, it might be convenient to do some scripting.