Forum Moderators: phranque
I realized my website has been penalized for duplicate content. I had a few sub domains with the same content but only just found out about this duplicate content thing. I have already removed my sub domain a while back and it is also a google cache in search engine e.g. www.test.example.com/filename.php.
What do I do to remove the google cache for www.test.example.com/filename.php or any other sub domains listed as a google cache, if I have already removed files/sub domains some time ago?
I have been looking into the robots.txt file but it doesn't help if the files or sub domains have already been removed. Unless there's something missing that I need to know?
Please help, many thanks in advance
CHEERS :)
This will remove the non-canonical URLs from the search results, eliminate the duplicate-content problem, and 'recover' any PageRank or Link Popularity conferred by inbound links to those non-canonical URL.
You may also wish to address other canonicalization issues at the same time -- for example, redirecting "/index.php" to "/".
Any page or resource on your site should be directly-accessible using one and only one unique canonical URL; All non-canonical URLs should be redirected to the canonical URL with a single 301 redirect. For example, a request for "eXample.Com.:80/index.php" should be redirected to "www.example.com/".
[added] Other than adding the redirects as suggested here, all you can do is wait -- and it may take many, many months. [/added]
Jim
These old URLs will drop out of the index (slowly) over time, so it's up to you to determine whether it is worth the bother of re-instating the DNS and control panel mappings so you can speed up their removal and recover their link-popularity/PageRank with a redirect.
Jim
There's the visual "cache" and there's the elephant's memory cache, which they check from time to time...
Now I know what to do for my new domain names and websites. Will have to do testing on Xammp from now on.
Hhmm..does anyone know how long it takes before Google index a webpage? So if I wanted to create a sub-domain to test out my form, and delete the sub-domain afterwards. Will Google index or cached that page? I was thinking 'yes' but would prefer someone who knows for sure.
Thanks in advance
If this were my site and I really felt compelled to eliminate the 'ghost' subdomains, I'd set up the 301 redirects on my live server and test them, then I'd go back into DNS and re-define the subdomains, pointing them to the live server's IP address so that the redirects could be invoked. Then I'd make sure that there was at least one link to each subdomain somewhere on the Web (only one is needed, and no need to add any more, but more would not be a problem unless they were links on/from your own main site).
Unless you've got more than two duplicates of these paqes, I wouldn't even worry about a 'penalty' -- True duplicate-content penalties are reserved for folks who set up a dozen copies of a site and link to all of them aggressively -- In other words, when the duplicates were clearly created intentionally and were then promoted intentionally, with the intent to fool search engines and search engine users. There are many sites on the Web with four (or more) URLs for their home pages, and they do just fine... Even google itself has (or had) a canonicalization problem: You could access either "www.google.com" or "www.google.com." to get to any page, and that FQDN would persist as you navigated the site.
Jim
Password-protection combined with robots.txt or robots.txt combined with an .htaccess deny are best. In both cases, the robots.txt file should/must always be freely-accessible to any user-agent, good or bad.
Jim
What's an example of how you would do it? Was that only for your main or test site e.g. [test.example.com...] or http://www.example.com? The link [webmasterworld.com...] g1smd recommended to password only the .htaccess directory.
This is what I did:
-------------------------------------------
I added a password to .htaccess for my directory [test.example.com,...] so every time I try to access my test link then I have to add the user and password.
-------------------------------------------
I hope .htaccess blocks all bots from crawling my test site [test.example.com....] Its the exact same as my http://www.example.com site?
It's complex enough that I wouldn't bother with it unless you actually have problems due to blocking robots.txt on your test domain.
Jim