homepage Welcome to WebmasterWorld Guest from 54.211.113.223
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
GWT, Robots.txt Update Timing
How long might it typically take for Google Webmaster Tools to update?
edgeportals

5+ Year Member



 
Msg#: 4176741 posted 5:16 pm on Jul 26, 2010 (gmt 0)

Looking for some expert opinions on the following:

I recently updated my website for my services-based business and have been trying to actively market it using AdWords. I've been having difficulty with the Ad Quality (not my question). After checking Google Webmaster Tools for that domain, I realized that it was including a subdirectory on my domain that included a dev-version of a client's website. Thus those keywords were completely monopolizing Google's view on my domain's new website.

I've since updated the robots.txt file to exclude this directory (which GWT shows, and the robots.txt tester confirms).

The source of the problem was a few links that didn't get updated by the client when they published their website on their domain (thus, it linked back to my domain and that dev directory). Those links have since been removed. They no longer appear in Google's cache, but a "site:domain mydomain" search of the client's domain still comes up with those pages in the results (despite the fact that both the cache and live site no longer show those links).

So, my problem, I think, is that Google is still associating the keywords from that subdirectory/client website with my domain's website. Which, I believe, is skewing my AdWords quality valuation significantly.

GWT shows Crawl errors for pages under that directory now. The new sitemap has been accessed. But the keywords list still shows the bad keywords.

Any idea how long it will take for Google to update it's "analysis" of my website to exclude those keywords? Is there something else I should be doing to get G to disassociate the subdir with my domain's website?

All help greatly appreciated. Thanks!

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4176741 posted 8:31 am on Jul 27, 2010 (gmt 0)

rather than excluding those urls from being crawled, you should probably be responding with a 404 Not Found or 410 Gone status code or a meta robots noindex until those urls and associated keywords are dropped from the index.

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4176741 posted 12:29 pm on Jul 27, 2010 (gmt 0)

it can take several weeks for google to update their crawling & indexing of those specific urls especially if they are not popular pages.

ZydoSEO

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4176741 posted 2:54 am on Jul 28, 2010 (gmt 0)

Once the Disallow: directive is in place, I would use the URL Removal Request in WMT to have those DEV URLs removed from their index.

edgeportals

5+ Year Member



 
Msg#: 4176741 posted 4:06 am on Jul 28, 2010 (gmt 0)

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals

5+ Year Member



 
Msg#: 4176741 posted 4:06 am on Jul 28, 2010 (gmt 0)

Thanks guys, all good answers. I review the Google URL Removal tool and it states to first use the other two rec's; 404/410 & meta=noindex.

The problem is that doing either of these would also impact the use of the dev site. If I add a meta tag to the code, it's too easy to accidentally post that to their live site and cause major issues. Same with triggering a 404/410 from code-behind.

Besides securing access from the server end, any thoughts on how to accomplish both of these from outside the page?

edgeportals

5+ Year Member



 
Msg#: 4176741 posted 4:11 am on Jul 28, 2010 (gmt 0)

Nevermind, looks like I can remove an entire directory by only using robots.txt

Thanks ZydoSEO!

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4176741 posted 1:07 am on Jul 29, 2010 (gmt 0)

using robot.txt will exclude the directory from being recrawled but won't prevent the url(s) from being indexed or necessarily removed from the index.
you could however use X-Robots-Tag [webmasterworld.com] which wouldn't get migrated with the content unless you also move the .htaccess file.

edgeportals

5+ Year Member



 
Msg#: 4176741 posted 7:35 pm on Aug 4, 2010 (gmt 0)

On IIS, can't use .htaccess. I've added X-Robots-Tag: no-index to the http header of the virtual directory, but I'm told that's not especially effective. GWT still shows inappropriate keywords. Although, I'm seeing appropriate ones creep up the list.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved