Welcome to WebmasterWorld Guest from 54.167.86.211

Forum Moderators: goodroi

Message Too Old, No Replies

GWT, Robots.txt Update Timing

How long might it typically take for Google Webmaster Tools to update?

     
5:16 pm on Jul 26, 2010 (gmt 0)

New User

10+ Year Member

joined:June 24, 2005
posts:25
votes: 0


Looking for some expert opinions on the following:

I recently updated my website for my services-based business and have been trying to actively market it using AdWords. I've been having difficulty with the Ad Quality (not my question). After checking Google Webmaster Tools for that domain, I realized that it was including a subdirectory on my domain that included a dev-version of a client's website. Thus those keywords were completely monopolizing Google's view on my domain's new website.

I've since updated the robots.txt file to exclude this directory (which GWT shows, and the robots.txt tester confirms).

The source of the problem was a few links that didn't get updated by the client when they published their website on their domain (thus, it linked back to my domain and that dev directory). Those links have since been removed. They no longer appear in Google's cache, but a "site:domain mydomain" search of the client's domain still comes up with those pages in the results (despite the fact that both the cache and live site no longer show those links).

So, my problem, I think, is that Google is still associating the keywords from that subdirectory/client website with my domain's website. Which, I believe, is skewing my AdWords quality valuation significantly.

GWT shows Crawl errors for pages under that directory now. The new sitemap has been accessed. But the keywords list still shows the bad keywords.

Any idea how long it will take for Google to update it's "analysis" of my website to exclude those keywords? Is there something else I should be doing to get G to disassociate the subdir with my domain's website?

All help greatly appreciated. Thanks!
8:31 am on July 27, 2010 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


rather than excluding those urls from being crawled, you should probably be responding with a 404 Not Found or 410 Gone status code or a meta robots noindex until those urls and associated keywords are dropped from the index.
12:29 pm on July 27, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


it can take several weeks for google to update their crawling & indexing of those specific urls especially if they are not popular pages.
2:54 am on July 28, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Nov 11, 2007
posts:769
votes: 1


Once the Disallow: directive is in place, I would use the URL Removal Request in WMT to have those DEV URLs removed from their index.
4:06 am on July 28, 2010 (gmt 0)

New User

10+ Year Member

joined:June 24, 2005
posts:25
votes: 0


Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?
4:06 am on July 28, 2010 (gmt 0)

New User

10+ Year Member

joined:June 24, 2005
posts:25
votes: 0


Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?
4:11 am on July 28, 2010 (gmt 0)

New User

10+ Year Member

joined:June 24, 2005
posts:25
votes: 0


Nevermind, looks like I can remove an entire directory by only using robots.txt

Thanks ZydoSEO!
1:07 am on July 29, 2010 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


using robot.txt will exclude the directory from being recrawled but won't prevent the url(s) from being indexed or necessarily removed from the index.
you could however use X-Robots-Tag [webmasterworld.com] which wouldn't get migrated with the content unless you also move the .htaccess file.
7:35 pm on Aug 4, 2010 (gmt 0)

New User

10+ Year Member

joined:June 24, 2005
posts:25
votes: 0


On IIS, can't use .htaccess. I've added X-Robots-Tag: no-index to the http header of the virtual directory, but I'm told that's not especially effective. GWT still shows inappropriate keywords. Although, I'm seeing appropriate ones creep up the list.