Forum Moderators: open

Message Too Old, No Replies

removing a specific subdirectory from google index...

         

hugo_guzman

7:12 pm on Oct 18, 2004 (gmt 0)

10+ Year Member



Here's my issue:

My company is trying to remove an entire subdirectory's worth of pages from a section our site (at the request of an advertiser). My first idea was to simply use the robots.txt command to exclude this subdirectory. The problem with that course of action is twofold (in my opinion):

1)Googlebot does not always obey the robots.txt command

2)Even if the command works, google will still have those pages in its cache, so the pages may remained indexed and accesible in Google.

Can anyone think of an unobstrusive means of getting these pages out of the google index (don't tell me to remove the pages from our site...that is not an option at this point...don't ask me why ).

I was thinking of using a 301 but I'm not sure that is the right course of action.

Basically, the goal is to remove these pages from the google index so that our users cannot find them when they perform a domain search using our (powered by google) search tool.

Any suggestions would be appreciated.

Hugo

Brett_Tabke

7:16 pm on Oct 18, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



1- google *always* obeys the command in Googles interpretation of the standard. That command is to "not index". That does *not* mean that they can not spider the content.

2- nope. i disagree.

If you are worried aobut the bots.txt working - then use bots no index meta tags instead.

>I was thinking of using a 301 but I'm not
>sure that is the right course of action.

Ya, simple htaccess command to redirect that directory for gbot to a 404.

hugo_guzman

7:37 pm on Oct 18, 2004 (gmt 0)

10+ Year Member



Brett,
Thanks for the quick feedback. The robots.txt metatag option maybe a bit difficult to implement because the subdirectory in question has 1000s of individual pages.

So you're fairly sure that if I use the robots.txt exclude the pages in the subdirectory will no longer be in google's index?

I just want to confirm before moving forward with a course of action.

Hugo