Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate content and server url

         

mexico14

3:38 pm on Aug 16, 2005 (gmt 0)

10+ Year Member



I have a website that was on page 1 for a number of years for our most popular search. Now on page 10. I have found some duplication of the website. mysite.com and the same website on the server with a different url servername.com/site/welcome.htm. Both sites are in the google cache and show as duplicate content when using a search of mysite.com at copyscape. Does google look at this as duplicate content?

experienced

1:34 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



if both the sites are listed in google and cache is also there. I believe this is the duplicacy. Do you own both the sites..?

Exp...

mexico14

4:36 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Don't own both sites, both sies are on the same server. The duplicate site is owned by the company that host my website. The company hosting the sits claims the google bots don't see it or index it as a duplicate site.

lammert

5:02 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have had exactly the same problem. My site was visible under both www.hostingcompany.com/account/ and www.mydomain.com/. Unfortunately the hosting company didn't allow me to use .htaccess under www.hostingcompany.com/account/ to do a 301 redirect, so I had to find another solution. This is what I did.

All my pages were created with SSI and SSI only worked on www.mydomain.com. I used a conditional header in each .shtml file with a meta refresh statement. This is the header for file /example.shtml:

<!--#IF EXPR="$SERVER_NAME!=www.mydomain.com" -->
<html>
<head>
<meta name="robots" content="noindex,follow">
<meta HTTP-EQUIV="refresh" content="10; URL=http://www.mydomain.com/example.shtml">
</head>
</html>
<!--#ENDIF -->
[ here the rest of the page ]

This trick works also when SSI is recognized at www.hostingcompany.com/account/ How it works:

The if statement is parsed by the SSI parser. If it is a request to www.mydomain.com, the header is skipped. Otherwise a header is created with "noindex,follow", followed by a meta redirect to your actual page at www.mydomain.com. The 10 in the content line is the amount of seconds before the redirect takes place. I tried 0 seconds, but that didn't work. In that case the Googlebot directly went to www.mydomain.com, forgot the "noindex,follow" robots tag and indexed the page under the first URL. This is the same behaviour as a 302 redirect. By waiting a few seconds the bot picks up the noindex and pages are removed from the index after some time.

If SSI doesn't work at www.hostingcompany.com/account/ as in my case, it works almost the same. The IF statement is not parsed but seen as a HTML comment and the head block is directly pasted in the output.

If your site uses another server side scripting language like ASP or PHP, you could do the same trick by sending a redirect header whenever the traffic doesn't originate from www.mydomain.com.