homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Removing entire website via robots.txt
Consequences of removing an entire website
doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30379 posted 8:41 pm on Jul 14, 2005 (gmt 0)

At the beginning of this year one of our websites had some problems caused due to duplicate content. To fix this, we decided to remove the entire website by changing the robots.txt to "User-agent: * Disallow: /". We used Google's automatic URL removal system to speed up the process. After a few days all pages were removed and we changed the robots.txt back to "User-agent: * Allow: /".

This action leads to the following behaviour:

- After 5 month there are still no pages in the index. When we removed the website it was said that this "will cause a temporary, 90 day removal". Some time later, Google increased this time to "180 days". One would expect that only websites are affected which were removed after they increased this time. However, also our website is affected.

- In the past websites which excluded Googlebot from crawling their pages were still in the index. The results just appeared as URL only entry, but they could be found due to incoming links and their anchor text. However, now the site doesn’t exist any more in the index. Even searching for domainname doesn't bring it up. Also, in the past PR was past to pages from thus websites while now these pages have PR0.

- In the past there was no effect for the directory. However, now the domain was removed from Google's directory.

The consequences of removing the entire website were not only different than expected, one could also use this behaviour to harm other websites (if you have access, e.g. if you want hurt a client site). Just changes the robots.txt and use the automatic URL removal system. Re-change the robots.txt after one or two days. The website will be removed for (at least) a half year and it will be hard to find the reason. More time will be needed until the original situation (all pages are indexed and have PR) is recovered.

To avoid such problems, I would suggest that Google change there policy and reinclude website if the robots.txt is changed back. Also, I would prefer if excluding Googlebot doesn't lead to a remove of the directory entry.

 

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30379 posted 12:42 pm on Jul 15, 2005 (gmt 0)

> Allow:

Is not something recommended by the robots.txt [google.com] standard [robotstxt.org]. We have some ancilary evidence that it may be confusing some search engines and causing indexing problems. It's usage is not recommended.

[robotstxt.org...]

doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30379 posted 12:58 pm on Jul 15, 2005 (gmt 0)

I just checked the robots.txt again. We used "User-agent: * Disallow: " (and not "User-agent: * Allow: /" as written in my first post), i.e. the behaviour isn't caused be a wrong syntax.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30379 posted 10:41 pm on Jul 15, 2005 (gmt 0)

I am seeing stuff being reincluded (as URL-only entries) after 90 days even though the robots.txt still has the Disallow: /cgi-bin in place. Do I really have to resubmit it all again? Aaarrgghhh!

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30379 posted 12:07 pm on Jul 19, 2005 (gmt 0)

I would be surprised if the url only entries did not pop back up in the index. robots exclusion usually doesn't stop that. You will also see full crawls on the banned site. The only way to stop that is with an htaccess ban.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30379 posted 12:21 pm on Jul 19, 2005 (gmt 0)


Changed all the links pointing to that stuff to be rel="nofollow" for all those pages that are behind a password, and have put "meta noindex" on all the others that Google could otherwise index, instead of the entries in robots.txt. Maybe that will work.

doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30379 posted 3:14 pm on Jul 19, 2005 (gmt 0)

I would be surprised if the url only entries did not pop back up in the index.

This was what I expected. In the past one could find URL only entries of sites which banned GoogleBot. (I would be thankful for such a behaviour - I never wanted a "complete remove".) However, all information about this domain are removed - even the directory entry.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved