|Which is the best way to remove a directory?|
Hi there guys,
I have a question: which is the best way to remove a directory from google index? Noindex all the links from that directory and simply let the search engine see the noindex at the next crawl? Or use noindex as well as the URL removal tool from WMT? Which method is safer?
I am not sure what you are trying to accomplish.
Is the directory on YOUR site?
If so, then you want to noindex the PAGES (not the links) of that directory.
You could nofollow the outbound links on your directory as well if you wanted.
After that, I think you can use the URL removal tool.
Yes Planet13, the directory is on my site. I wanted to do a quick removal of the directory. But the question is: is it better to use the noindex and WMT removal tool, or just noindex and let google remove the directory when it crawls the website?
Noindex may take a little time to take effect.
Google's WMT URL removal tool will quickly remove pages from Google's search results. It was created more for accidental indexing of materials, which should of never been indexed. Under normal usage the content being removed is never reposted or that URL remains 404 forever.
So if you are never going to post the materials that got indexed or reuse that URL in the next week for content you want indexed ... The WMT URL removal was created for this exact need.
The normal usage of the tool is step one remove the materials so that the URL delivers a 404 error code. Remove the directory use the tool and that URL will vanish very quickly from the serps.
If you don't want to remove the content
The 404 http header can be created using php ...
header("HTTP/1.0 404 Not Found");
... and custom content can be delivered for the url. In other words you can still have the URL functional for visitors. You should however be concerned if delivering content via 404 pages! It is a method used by black hats to get around Google's spam and bad neighborhood filtering algos, hence noindex is better if speed is not a concern and the content is not taken down. Capcha codes are the best white hat practice for content that should only be shown to humans.
A re-indexing problem
If links are pointing towards the page they should be removed; Google will attempt to reindex the URL because of those links and consider links to 404 pages as errors on the site. One does not want a site with links to 404 pages; I've not done research on when and how much effect links to 404 pages from a good URL effects that URL in SEO. By using noindex method you do not need to worry about these links.
Post removal the noindex can be used. Noindex should prevent the page from being indexed, but when google updates, (they are not perfect), it may from time to time remember that the URL existed and re-include it, although I would expect that to be brief. If a URL that was OK to be indexed, and changed to noindex gets re-indexed, it will also get re-noindexed.
An alternative if the URL will continue to have links is to have that URL be / require a login, which could be as simple as a captcha code to allow only humans to view the materials. To my knowledge black hats are not using this method. There are numerous reason's that content needs to be available to individual humans but not robots, including merely the website resources of looking up data where capcha codes are the best practice. Certainly a site that downloads data, which never changes does not need Google or any other robot downloading terabytes of data, and Google does not want to download data it can not index either.
My directory contains thin content and the website might have a penalty. Now i want to get rid of that content. I've heard that noindex is enough, but it takes a while.
If the site has a penalty because of the content, how I would handle that situation is removing the content, Fixing it, and re-posting good content, which would normally be in a different location so as to not have a history of being tainted.
If the penalty has not resulted in a loss of traffic to those pages, then I would fix them at their current locations. Obviously if the pages have traffic it is not as bad as Google has decided or people would not continue to use it ... A penalty from say a link to a bad site does not continue to taint the URL after it is fixed. I've dealt with false malware because of subdomains on shared domains and have had them fixed in days without changing any content. I assume the same time frame would be true regardless of the type of domain.
If the problem were a user directory, I would want to take steps to prevent the activity that caused a problem. Noindex does not mean Googlebot does not look at what exists on those pages, It only means that those pages are not going to appear in Google search. Thus noindex does not vaccinate a site from penalties, which is why black hats deliver content via a 404 error status. However, Google has not ran out of black and white animals; a 404 status does not prevent googlebot from looking at what is delivered.
A directory that users can place content in will require regular maintenance of some kind. Having a user be able to become a moderator may provide enough maintenance.
My recommendation for best practice for user directories is user subdomains, then if users exist who do not want to be PG13, those pages are behind a captcha which warns visitors that content may not be PG13. The Captcha can also set a cookie so it is not annoying. Maintenance is still required because users can not link to malware, etc, etc.
WMT URL removal
I checked to see if anything was changed, To use it the page status returned must be a 404 or a 410. 410 means the URL will "never" be found. 404 is that it is not found right now. It will not do anything if the meta tag is changed to noindex.
For dealing with a penalty 404 is safest.
Here are some excerpts on advice about using the removal tool given by John Mueller of Google...
Bulk Content Removal...
|Perhaps some clarifications can help ... |
- The URL removal tool is not meant to be used for normal site maintenance like this. This is part of the reason why we have a limit there.
- The URL removal tool does not remove URLs from the index, it removes them from our search results. The difference is subtle, but it's a part of the reason why you don't see those submissions affect the indexed URL count....
For large-scale site changes like this, I'd recommend:
- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed
- make sure that the crawl rate setting is set to "let Google decide" (automatic), so that you don't limit crawling
- use the URL removal tool only for urgent or highly-visibile issues.
I recommend reading all of John's comments... and enough of the Google Support thread to get the context of the problem he's discussing.
Also, see this thread here from last year. It has some comments on complications of using the removal tool.
How long for Google to remove NOINDEX pages from its index?
|Noindex all the links from that directory |
What did you mean by this? "Nofollow all the links to that directory"? (This isn't good advice, but is the nearest I can come to making sense of the line.)
Thank you for all your help guys! I understood now everything i need to do!