phranque

msg:4177175 | 8:31 am on Jul 27, 2010 (gmt 0) |
rather than excluding those urls from being crawled, you should probably be responding with a 404 Not Found or 410 Gone status code or a meta robots noindex until those urls and associated keywords are dropped from the index.
|
goodroi

msg:4177246 | 12:29 pm on Jul 27, 2010 (gmt 0) |
it can take several weeks for google to update their crawling & indexing of those specific urls especially if they are not popular pages.
|
ZydoSEO

msg:4177646 | 2:54 am on Jul 28, 2010 (gmt 0) |
Once the Disallow: directive is in place, I would use the URL Removal Request in WMT to have those DEV URLs removed from their index.
|
edgeportals

msg:4177661 | 4:06 am on Jul 28, 2010 (gmt 0) |
Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex. The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind. Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?
|
edgeportals

msg:4177662 | 4:06 am on Jul 28, 2010 (gmt 0) |
Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex. The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind. Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?
|
edgeportals

msg:4177664 | 4:11 am on Jul 28, 2010 (gmt 0) |
Nevermind, looks like I can remove an entire directory by only using robots.txt Thanks ZydoSEO!
|
phranque

msg:4178264 | 1:07 am on Jul 29, 2010 (gmt 0) |
using robot.txt will exclude the directory from being recrawled but won't prevent the url(s) from being indexed or necessarily removed from the index. you could however use X-Robots-Tag [webmasterworld.com] which wouldn't get migrated with the content unless you also move the .htaccess file.
|
edgeportals

msg:4182035 | 7:35 pm on Aug 4, 2010 (gmt 0) |
On IIS, can't use .htaccess. I've added X-Robots-Tag: no-index to the http header of the virtual directory, but I'm told that's not especially effective. GWT still shows inappropriate keywords. Although, I'm seeing appropriate ones creep up the list.
|
|