Forum Moderators: phranque

Message Too Old, No Replies

Deleted subdomain: Help remove Google Cache

         

buckmajor

11:52 am on Jul 12, 2009 (gmt 0)

10+ Year Member



Hi there

I realized my website has been penalized for duplicate content. I had a few sub domains with the same content but only just found out about this duplicate content thing. I have already removed my sub domain a while back and it is also a google cache in search engine e.g. www.test.example.com/filename.php.

What do I do to remove the google cache for www.test.example.com/filename.php or any other sub domains listed as a google cache, if I have already removed files/sub domains some time ago?

I have been looking into the robots.txt file but it doesn't help if the files or sub domains have already been removed. Unless there's something missing that I need to know?

Please help, many thanks in advance
CHEERS :)

jdMorgan

6:36 pm on Jul 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Redirect all non-canonical domains and sub-domains to your canonical (single, correct) hostname using a 301-Moved Permanently redirect.

This will remove the non-canonical URLs from the search results, eliminate the duplicate-content problem, and 'recover' any PageRank or Link Popularity conferred by inbound links to those non-canonical URL.

You may also wish to address other canonicalization issues at the same time -- for example, redirecting "/index.php" to "/".

Any page or resource on your site should be directly-accessible using one and only one unique canonical URL; All non-canonical URLs should be redirected to the canonical URL with a single 301 redirect. For example, a request for "eXample.Com.:80/index.php" should be redirected to "www.example.com/".

[added] Other than adding the redirects as suggested here, all you can do is wait -- and it may take many, many months. [/added]

Jim

buckmajor

10:19 am on Jul 14, 2009 (gmt 0)

10+ Year Member



thanks jd, so how do I redirect all sub domains if they are no longer on the server or my control panel?

I got the "index.php to /" going but not the sub domains that are listed google.

jdMorgan

5:19 pm on Jul 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well you can't, unless you re-define them so you can redirect them.

These old URLs will drop out of the index (slowly) over time, so it's up to you to determine whether it is worth the bother of re-instating the DNS and control panel mappings so you can speed up their removal and recover their link-popularity/PageRank with a redirect.

Jim

buckmajor

3:49 am on Jul 18, 2009 (gmt 0)

10+ Year Member



True! Are you saying the old url that is cached will drop out of the index. So this is nothing to worry then eh?

Cool, so exactly how long will it take to see these links removed from google?

tangor

5:59 am on Jul 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sad thing is it will take as long as Google decides... and months, even years later, they will check them all again. The problem with the Google Monster is they have all this data, and they continue to test it time and again...

There's the visual "cache" and there's the elephant's memory cache, which they check from time to time...

buckmajor

11:04 am on Jul 18, 2009 (gmt 0)

10+ Year Member



Ok, so far I have learnt not to create a sub-domain that has the same content as the domain name. I only did this to test it online first but Google penalizes you for duplicate content :(.

Now I know what to do for my new domain names and websites. Will have to do testing on Xammp from now on.

Hhmm..does anyone know how long it takes before Google index a webpage? So if I wanted to create a sub-domain to test out my form, and delete the sub-domain afterwards. Will Google index or cached that page? I was thinking 'yes' but would prefer someone who knows for sure.

Thanks in advance

jdMorgan

3:29 pm on Jul 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Three common methods are to password-protect your 'test' sites/subdomains, to use robots.txt to deny all test pages to all robots, or to use mod_access or mod_rewrite to make them inaccessible to all but your own IP address (assuming that your IP address is static).

If this were my site and I really felt compelled to eliminate the 'ghost' subdomains, I'd set up the 301 redirects on my live server and test them, then I'd go back into DNS and re-define the subdomains, pointing them to the live server's IP address so that the redirects could be invoked. Then I'd make sure that there was at least one link to each subdomain somewhere on the Web (only one is needed, and no need to add any more, but more would not be a problem unless they were links on/from your own main site).

Unless you've got more than two duplicates of these paqes, I wouldn't even worry about a 'penalty' -- True duplicate-content penalties are reserved for folks who set up a dozen copies of a site and link to all of them aggressively -- In other words, when the duplicates were clearly created intentionally and were then promoted intentionally, with the intent to fool search engines and search engine users. There are many sites on the Web with four (or more) URLs for their home pages, and they do just fine... Even google itself has (or had) a canonicalization problem: You could access either "www.google.com" or "www.google.com." to get to any page, and that FQDN would persist as you navigated the site.

Jim

buckmajor

8:42 am on Jul 19, 2009 (gmt 0)

10+ Year Member



True! Thanks for the breakdown Jim, that makes a lot more sense. So I can use the robots.txt to deny all test pages then.

buckmajor

4:20 am on Jul 29, 2009 (gmt 0)

10+ Year Member



Hey I found this link [webmasterworld.com...] and I think this is the way to go to block all bots from the test site.

jdMorgan

2:48 pm on Jul 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The conclusion of that thread was rather ambiguous. So what way are you planning to use?

Password-protection combined with robots.txt or robots.txt combined with an .htaccess deny are best. In both cases, the robots.txt file should/must always be freely-accessible to any user-agent, good or bad.

Jim

buckmajor

4:14 pm on Jul 31, 2009 (gmt 0)

10+ Year Member



Hey Jim

My understanding from that link was to password .htaccess the root folder www.example.com.

Or was I suppose to use the robot.txt file too?

jdMorgan

4:37 pm on Jul 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just make sure you don't put a password on robots.txt...

Jim

buckmajor

1:26 am on Aug 2, 2009 (gmt 0)

10+ Year Member



Hey Jim

What's an example of how you would do it? Was that only for your main or test site e.g. [test.example.com...] or http://www.example.com? The link [webmasterworld.com...] g1smd recommended to password only the .htaccess directory.

This is what I did:
-------------------------------------------
I added a password to .htaccess for my directory [test.example.com,...] so every time I try to access my test link then I have to add the user and password.
-------------------------------------------

I hope .htaccess blocks all bots from crawling my test site [test.example.com....] Its the exact same as my http://www.example.com site?

jdMorgan

2:47 pm on Aug 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mod_access "Allow from env=", mod_setenvif, and the "Satisfy any" directive in Apache core can be used to allow requests for robots.txt to by-pass the login requirement.

It's complex enough that I wouldn't bother with it unless you actually have problems due to blocking robots.txt on your test domain.

Jim