| 12:56 am on Jan 3, 2007 (gmt 0)|
If you have duplicate page issues, you'd do much better to deal with the problem; remove the pages.
Neither robots.txt nor the removal tool are a reliable and safe way to deal with the problem, particularly if links exist to the pages in question.
| 2:36 pm on Jan 4, 2007 (gmt 0)|
You can only remove pages that return a 404 error if you are using the page removal tool.
If you want to use the robots removal option, then you instead feed Google the URL of a valid robots.txt file for it to process.
In any case, these processes do not properly deal with results that are already tagged as Supplemental in Google's SERP.
| 3:54 pm on Jan 4, 2007 (gmt 0)|
Bummer. We use the second domain as a development and test site for the live one.
I don't understand why G just doesn't obey the robots.txt and sitemaps file. It would certainly make their life easier and the SE up-to-date if they did.
They've been restricted in robots.txt for almost a year, yet refuses to omit those pages.
I'll submit a sitemap to them with nothing in it and see what happens.
| 4:35 pm on Jan 4, 2007 (gmt 0)|
There's another option in the url removal tool -- one that asks googlebot to re-spider your robots.txt and remove what is disallowed by the file. I used it twice since November and it was trouble-free for me.
| 5:01 pm on Jan 4, 2007 (gmt 0)|
Okay, I submitted the robots.txt to be re-spidered by G.
I was just at G Sitemaps and registered a sitemap for that URL with zero urls.
Let's see if these work.
My guess is that they won't remove pages already indexed or cached or in the supplementals. Why? In the past, I had a Sitemap omitting old pages, months later they still existed somewhere in the indexes.
| 5:23 pm on Jan 4, 2007 (gmt 0)|
The sitemap will not remove urls not found on it, as you supposed. But if you put up a new robots.txt AND used that part of Google's url removal tool that says "Remove pages, subdirectories or images using a robots.txt file." the you are on your way.
This option also allows you to create a dedicated robot.txt file at some other address than the standard root robot.txt -- note "Your robots.txt file need not be in the root directory".
I never tried that, but I can imagine how someone might want it in some situations. However in your situation, you never want bots to spider your dev server, so you probably do want to use the standard robots.txt file to do the url removal.