GWT, Robots.txt Update Timing - Sitemaps, Meta Data, and robots.txt forum at WebmasterWorld - WebmasterWorld

Forum Moderators: goodroi

Message Too Old, No Replies

GWT, Robots.txt Update Timing

How long might it typically take for Google Webmaster Tools to update?

edgeportals

5:16 pm on Jul 26, 2010 (gmt 0)

10+ Year Member

Looking for some expert opinions on the following:

I recently updated my website for my services-based business and have been trying to actively market it using AdWords. I've been having difficulty with the Ad Quality (not my question). After checking Google Webmaster Tools for that domain, I realized that it was including a subdirectory on my domain that included a dev-version of a client's website. Thus those keywords were completely monopolizing Google's view on my domain's new website.

I've since updated the robots.txt file to exclude this directory (which GWT shows, and the robots.txt tester confirms).

The source of the problem was a few links that didn't get updated by the client when they published their website on their domain (thus, it linked back to my domain and that dev directory). Those links have since been removed. They no longer appear in Google's cache, but a "site:domain mydomain" search of the client's domain still comes up with those pages in the results (despite the fact that both the cache and live site no longer show those links).

So, my problem, I think, is that Google is still associating the keywords from that subdirectory/client website with my domain's website. Which, I believe, is skewing my AdWords quality valuation significantly.

GWT shows Crawl errors for pages under that directory now. The new sitemap has been accessed. But the keywords list still shows the bad keywords.

Any idea how long it will take for Google to update it's "analysis" of my website to exclude those keywords? Is there something else I should be doing to get G to disassociate the subdir with my domain's website?

All help greatly appreciated. Thanks!

phranque

8:31 am on Jul 27, 2010 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

rather than excluding those urls from being crawled, you should probably be responding with a 404 Not Found or 410 Gone status code or a meta robots noindex until those urls and associated keywords are dropped from the index.

goodroi

12:29 pm on Jul 27, 2010 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

it can take several weeks for google to update their crawling & indexing of those specific urls especially if they are not popular pages.

ZydoSEO

2:54 am on Jul 28, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Once the Disallow: directive is in place, I would use the URL Removal Request in WMT to have those DEV URLs removed from their index.

edgeportals

4:06 am on Jul 28, 2010 (gmt 0)

10+ Year Member

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals

4:06 am on Jul 28, 2010 (gmt 0)

10+ Year Member

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals

4:11 am on Jul 28, 2010 (gmt 0)

10+ Year Member

Nevermind, looks like I can remove an entire directory by only using robots.txt

Thanks ZydoSEO!

phranque

1:07 am on Jul 29, 2010 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

using robot.txt will exclude the directory from being recrawled but won't prevent the url(s) from being indexed or necessarily removed from the index.
you could however use X-Robots-Tag [webmasterworld.com] which wouldn't get migrated with the content unless you also move the .htaccess file.

edgeportals

7:35 pm on Aug 4, 2010 (gmt 0)

10+ Year Member

On IIS, can't use .htaccess. I've added X-Robots-Tag: no-index to the http header of the virtual directory, but I'm told that's not especially effective. GWT still shows inappropriate keywords. Although, I'm seeing appropriate ones creep up the list.