Welcome to WebmasterWorld Guest from 54.167.102.69

Message Too Old, No Replies

How to get a non-existent directory out of Google

More than 22000 pages showing 404

     
6:14 pm on Jul 25, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2002
posts:741
votes: 0


In Google's removal information [google.com] they talk about how to remove a particular directory using the robots.txt file. That's fine if you have a directory that actually exists and you don't want it crawled (although my understanding is that the directory will still appear, but as URL only), but I have a site with a directory that no longer exists and whose internal files appear in Google even though they haven't existed and have returned a 404 error for over a year now.

So, how do I get this directory and all URL's below it out of Google? Surely the robots.txt solution won't work because I want Google to see this directory (or rather see that it doesn't exist) so that the 404's are returned.

In case you're wondering, the directory was a reproduction of the DMOZ directory, personalised with one of the many scripts out there, but which I got rid of over a year ago because of possible duplicate content. But Google still shows more than 22000 pages of my site that don't exist due to this reason. So, any ideas how to get them out of the index?

11:37 am on July 26, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 21, 2003
posts:41
votes: 0


DO NOT use the Removal Tool, because it will not remove your pages from their database. The only effect would be that your pages won't show up in the SERPS for 180 days (or maybe just 90 days - G still seems to be uncertain about the exact time span). After that period all the unwanted stuff will happily re-appear.

Two choices here:

a) Return a 410 (Gone) status code and keep your fingers crossed that G bot might find the time to look at those ancient URLs again.

b) Use a "DISALLOW /outdated_stuff/" in your robots.txt and again keep your fingers crossed ...

Either way, stay away from the Removal Tool!

IMHO G has a real problem here. There doesn't seem to be any practical solution to effectively tell them to delete outdated / unwanted stuff from their database. Once they stored something, they'll keep it - until THEY decide to delete it.

Peter

11:40 am on July 26, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 13, 2003
posts:602
votes: 0


Why not use a custom 404 error page to capture all that traffic and do something with it?
1:32 pm on July 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2002
posts:741
votes: 0


I don't think the pages receive much traffic - they're all in the Supplementary Results.

And I'd rather have them out of the index entirely, because I'm not sure if Google is seeing them as duplicate content in some way and that's affecting the rest of the site.

I've decided to recreate just the directory and use Petrocelli's b) option, and hope for the best. We'll see.

2:36 pm on July 26, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 8, 2005
posts:146
votes: 0


I have often wondered if after using the removal tool that the items that you have removed still have an affect on your ranking.

You are correct, once google grabs hold of something - it never seems to let go. I have a couple of directories on our site that have inflated pages values in google. The directories contain 200 files. However, google reports that they contain 5000 files. I have decided to move the directories to a new name and do a 301-redirect to the new name. This seems to be helping in getting the inflated page values back in line.

2:48 pm on July 26, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 8, 2005
posts:146
votes: 0


Curious, does anyone have any good ideas of how to get pagecount back in line when using the "Site:" command?
3:02 pm on July 26, 2005 (gmt 0)

Administrator

WebmasterWorld Administrator bakedjake is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 8, 2003
posts:3783
votes: 2


And I'd rather have them out of the index entirely

Return a 403 when Google requests those pages.

3:08 pm on July 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2002
posts:741
votes: 0


Return a 403 when Google requests those pages

Yes? Do you think that will help more than a 404? Could you explain why? I think part of the problem is that Google is simply not requesting the pages at all, because the cache date is from last year.

3:30 pm on July 26, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 8, 2005
posts:146
votes: 0


I am curious on the case why google no longer requests pages... When using "Site:" command... It says that there are approximately 80,000 pages found for our site. Our site has no mare than 20,000 pages. Where are the 60,000 other pages coming from? Tried to ask google on this but did not get a clear response. Only that index and pages counts can change at any given time.
11:09 pm on July 26, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 8, 2005
posts:146
votes: 0


Googleguy,
I noticed you are answering questions on other threads... Can you put a little light on why page counts are so far off from actual page counts on a given site? Also, what is the best way to get pages out of the google database? It seems that once google grabs ahold - it never lets go.

All thoughts would be appreciated.

Thanks.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members