Welcome to WebmasterWorld Guest from 184.108.40.206
So, how do I get this directory and all URL's below it out of Google? Surely the robots.txt solution won't work because I want Google to see this directory (or rather see that it doesn't exist) so that the 404's are returned.
In case you're wondering, the directory was a reproduction of the DMOZ directory, personalised with one of the many scripts out there, but which I got rid of over a year ago because of possible duplicate content. But Google still shows more than 22000 pages of my site that don't exist due to this reason. So, any ideas how to get them out of the index?
Two choices here:
a) Return a 410 (Gone) status code and keep your fingers crossed that G bot might find the time to look at those ancient URLs again.
b) Use a "DISALLOW /outdated_stuff/" in your robots.txt and again keep your fingers crossed ...
Either way, stay away from the Removal Tool!
IMHO G has a real problem here. There doesn't seem to be any practical solution to effectively tell them to delete outdated / unwanted stuff from their database. Once they stored something, they'll keep it - until THEY decide to delete it.
And I'd rather have them out of the index entirely, because I'm not sure if Google is seeing them as duplicate content in some way and that's affecting the rest of the site.
I've decided to recreate just the directory and use Petrocelli's b) option, and hope for the best. We'll see.
You are correct, once google grabs hold of something - it never seems to let go. I have a couple of directories on our site that have inflated pages values in google. The directories contain 200 files. However, google reports that they contain 5000 files. I have decided to move the directories to a new name and do a 301-redirect to the new name. This seems to be helping in getting the inflated page values back in line.
All thoughts would be appreciated.