homepage Welcome to WebmasterWorld Guest from 54.198.42.213
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
GWT, Robots.txt Update Timing
How long might it typically take for Google Webmaster Tools to update?
edgeportals




msg:4176743
 5:16 pm on Jul 26, 2010 (gmt 0)

Looking for some expert opinions on the following:

I recently updated my website for my services-based business and have been trying to actively market it using AdWords. I've been having difficulty with the Ad Quality (not my question). After checking Google Webmaster Tools for that domain, I realized that it was including a subdirectory on my domain that included a dev-version of a client's website. Thus those keywords were completely monopolizing Google's view on my domain's new website.

I've since updated the robots.txt file to exclude this directory (which GWT shows, and the robots.txt tester confirms).

The source of the problem was a few links that didn't get updated by the client when they published their website on their domain (thus, it linked back to my domain and that dev directory). Those links have since been removed. They no longer appear in Google's cache, but a "site:domain mydomain" search of the client's domain still comes up with those pages in the results (despite the fact that both the cache and live site no longer show those links).

So, my problem, I think, is that Google is still associating the keywords from that subdirectory/client website with my domain's website. Which, I believe, is skewing my AdWords quality valuation significantly.

GWT shows Crawl errors for pages under that directory now. The new sitemap has been accessed. But the keywords list still shows the bad keywords.

Any idea how long it will take for Google to update it's "analysis" of my website to exclude those keywords? Is there something else I should be doing to get G to disassociate the subdir with my domain's website?

All help greatly appreciated. Thanks!

 

phranque




msg:4177175
 8:31 am on Jul 27, 2010 (gmt 0)

rather than excluding those urls from being crawled, you should probably be responding with a 404 Not Found or 410 Gone status code or a meta robots noindex until those urls and associated keywords are dropped from the index.

goodroi




msg:4177246
 12:29 pm on Jul 27, 2010 (gmt 0)

it can take several weeks for google to update their crawling & indexing of those specific urls especially if they are not popular pages.

ZydoSEO




msg:4177646
 2:54 am on Jul 28, 2010 (gmt 0)

Once the Disallow: directive is in place, I would use the URL Removal Request in WMT to have those DEV URLs removed from their index.

edgeportals




msg:4177661
 4:06 am on Jul 28, 2010 (gmt 0)

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals




msg:4177662
 4:06 am on Jul 28, 2010 (gmt 0)

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals




msg:4177664
 4:11 am on Jul 28, 2010 (gmt 0)

Nevermind, looks like I can remove an entire directory by only using robots.txt

Thanks ZydoSEO!

phranque




msg:4178264
 1:07 am on Jul 29, 2010 (gmt 0)

using robot.txt will exclude the directory from being recrawled but won't prevent the url(s) from being indexed or necessarily removed from the index.
you could however use X-Robots-Tag [webmasterworld.com] which wouldn't get migrated with the content unless you also move the .htaccess file.

edgeportals




msg:4182035
 7:35 pm on Aug 4, 2010 (gmt 0)

On IIS, can't use .htaccess. I've added X-Robots-Tag: no-index to the http header of the virtual directory, but I'm told that's not especially effective. GWT still shows inappropriate keywords. Although, I'm seeing appropriate ones creep up the list.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved