homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
How to get a non-existent directory out of Google
More than 22000 pages showing 404
WebWalla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30535 posted 6:14 pm on Jul 25, 2005 (gmt 0)

In Google's removal information [google.com] they talk about how to remove a particular directory using the robots.txt file. That's fine if you have a directory that actually exists and you don't want it crawled (although my understanding is that the directory will still appear, but as URL only), but I have a site with a directory that no longer exists and whose internal files appear in Google even though they haven't existed and have returned a 404 error for over a year now.

So, how do I get this directory and all URL's below it out of Google? Surely the robots.txt solution won't work because I want Google to see this directory (or rather see that it doesn't exist) so that the 404's are returned.

In case you're wondering, the directory was a reproduction of the DMOZ directory, personalised with one of the many scripts out there, but which I got rid of over a year ago because of possible duplicate content. But Google still shows more than 22000 pages of my site that don't exist due to this reason. So, any ideas how to get them out of the index?

 

Petrocelli

10+ Year Member



 
Msg#: 30535 posted 11:37 am on Jul 26, 2005 (gmt 0)

DO NOT use the Removal Tool, because it will not remove your pages from their database. The only effect would be that your pages won't show up in the SERPS for 180 days (or maybe just 90 days - G still seems to be uncertain about the exact time span). After that period all the unwanted stuff will happily re-appear.

Two choices here:

a) Return a 410 (Gone) status code and keep your fingers crossed that G bot might find the time to look at those ancient URLs again.

b) Use a "DISALLOW /outdated_stuff/" in your robots.txt and again keep your fingers crossed ...

Either way, stay away from the Removal Tool!

IMHO G has a real problem here. There doesn't seem to be any practical solution to effectively tell them to delete outdated / unwanted stuff from their database. Once they stored something, they'll keep it - until THEY decide to delete it.

Peter

birdstuff

10+ Year Member



 
Msg#: 30535 posted 11:40 am on Jul 26, 2005 (gmt 0)

Why not use a custom 404 error page to capture all that traffic and do something with it?

WebWalla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30535 posted 1:32 pm on Jul 26, 2005 (gmt 0)

I don't think the pages receive much traffic - they're all in the Supplementary Results.

And I'd rather have them out of the index entirely, because I'm not sure if Google is seeing them as duplicate content in some way and that's affecting the rest of the site.

I've decided to recreate just the directory and use Petrocelli's b) option, and hope for the best. We'll see.

wiseapple

5+ Year Member



 
Msg#: 30535 posted 2:36 pm on Jul 26, 2005 (gmt 0)

I have often wondered if after using the removal tool that the items that you have removed still have an affect on your ranking.

You are correct, once google grabs hold of something - it never seems to let go. I have a couple of directories on our site that have inflated pages values in google. The directories contain 200 files. However, google reports that they contain 5000 files. I have decided to move the directories to a new name and do a 301-redirect to the new name. This seems to be helping in getting the inflated page values back in line.

wiseapple

5+ Year Member



 
Msg#: 30535 posted 2:48 pm on Jul 26, 2005 (gmt 0)

Curious, does anyone have any good ideas of how to get pagecount back in line when using the "Site:" command?

bakedjake

WebmasterWorld Administrator bakedjake us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30535 posted 3:02 pm on Jul 26, 2005 (gmt 0)

And I'd rather have them out of the index entirely

Return a 403 when Google requests those pages.

WebWalla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30535 posted 3:08 pm on Jul 26, 2005 (gmt 0)

Return a 403 when Google requests those pages

Yes? Do you think that will help more than a 404? Could you explain why? I think part of the problem is that Google is simply not requesting the pages at all, because the cache date is from last year.

wiseapple

5+ Year Member



 
Msg#: 30535 posted 3:30 pm on Jul 26, 2005 (gmt 0)

I am curious on the case why google no longer requests pages... When using "Site:" command... It says that there are approximately 80,000 pages found for our site. Our site has no mare than 20,000 pages. Where are the 60,000 other pages coming from? Tried to ask google on this but did not get a clear response. Only that index and pages counts can change at any given time.

wiseapple

5+ Year Member



 
Msg#: 30535 posted 11:09 pm on Jul 26, 2005 (gmt 0)

Googleguy,
I noticed you are answering questions on other threads... Can you put a little light on why page counts are so far off from actual page counts on a given site? Also, what is the best way to get pages out of the google database? It seems that once google grabs ahold - it never lets go.

All thoughts would be appreciated.

Thanks.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved