Forum Moderators: open

Message Too Old, No Replies

Client mirroring its site content on co-branded newspaper sites

How best to avoid complications?

         

Robert Charlton

3:01 am on Apr 7, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I was recently surprised by mirrors of a client's site showing up in serps on both Google and Teoma, in some cases replacing original search results... in others appearing right next to them.

The urls point to essentially co-branded mirrors of the client's main site that have begun to appear in various newspaper sites. The urls are in the form www.newspapername.clientdomain/pagename.

The client... bless them... didn't mention this to me, even though we'd just spent a bunch of time removing mirror sites they'd had up on the .com, .net, and .org variants of their domain and straightening out the confused linking situation. They are a non-profit organization with an online education directory, and they want maximum exposure for their material. The newspaper co-branding brings in lots of visitors.

So, they're not going to take these mirror sites down. The question is, how best to handle the search engine aspect of this? One approach that's been discussed would be to block the spidering of these subdomains. This seems clean and straightforward. My tendency would be to block spidering access and let the Teoma and Google etc index info for subdomains die naturally. Does anyone see any complications from this or reasons not to block spidering access?

Brett_Tabke

9:37 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I read this post several times this weekend Robert, and I didn't quite understand it at the time.

Upon closer inspection, yes, block those subdomains with a robots.txt. If can that is.

NFFC

10:26 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What the Boss said assuming that there are no links from the maindomain to the subs.

Robert Charlton

11:43 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Thanks. To clarify the question, the client is offering its content to partner sites, where it's co-branded and put on branded subdomains... essentially creating mirror sites.

Was planning to use robots.txt. Is there a universal exclusion for all robots? This would simplify maintenance if there is.

WebGuerrilla

6:53 am on Apr 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Robert, I think a standard Disallow all would work.

User-agent: *
Disallow: /

Although for Google, I think I might do something a bit more creative with Mod Rewrite. (Of course, I'm assuming you're on Unix/Apache)
As the number of the co-branded subdomains grow, Googlebot will stumble across them on a regular basis. Instead of just disallowing Gogglebot, you could send him back to the main site. It could help you get crawled a bit more, and it would be more reliable.

chris_f

7:38 am on Apr 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of the sites I am monitoring does cobrands. Basically, it is a classified advertising medium. They use cobrands where by the template is generally identical (except for some colour changes) and the search results are generated from the main sites database. They are hosted on the main sites server with the same IP Address. Is this a problem with search engines? If so, why? All they are doing is spreading their content to other portals.

Robert Charlton

7:14 am on Apr 10, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



WG - Thanks. The situation may be more complicated than I thought...

I'm seeing on the new Google results (which are still a little unstable) that an important interior page on the main domain no longer shows any links to it... they've all been credited to one of the subdomained co-branded sites.

As I start exploring the links, I see that some of them are to this page on the main domain, and some of them are to the page on other subdomains... so I think Google is looking at the links to all the subdomains and assigning them to the one that it deems to be most important. If we ban spidering these subdomains, what happens to the links to the subdomains? There are a lot of them.

And, in general, in trying to keep just the main domain indexed by using robots.txt, will we be throwing away links to the subdomains?

>>Is this a problem with search engines? If so, why?<<

Chris - Engines generally don't like dupe content and will drop you because of it. I'm concerned about that with this site, but I'm also concerned about the kind of linking confusion I've just described.

WebGuerrilla

8:24 pm on Apr 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google can be very unpredictable when it comes to dup content.They also can be a bit unpredictable when it comes to honoring the robots.txt.

At the moment, I'm working on a project very similar to the one Chris mentioned. In those circumsatances, we absolutely do not want the cobranded versions showing up in search engines.

Rather than use robots.txt, we are setting up rewrite rules that will check UA and referring URLS. If Googlebot shows up requesting sobdomain.domain.com access will be denied, and Googlebot would be given domain.com.

You could do something similar with the referral, so anyone requesting subdomain.domain.com would end up getting just domain.com

You can find some great Rewrite examples here [webmasterworld.com].

Robert Charlton

9:05 pm on Apr 10, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



WG - The setup on this site is similar to Chris's and yours... everything on the main site's database... similar template with graphic variation for co-branding.

Word from the system administrator on this site is that there's a potentially big performance hit with Mod Rewrite, because the server would be checking every request for a page to see if it's Googlebot. How big a factor is this?

Consensus for now is to disallow everything in the robots.txt for each subdomain... and then I'd submit the new urls that have already been indexed to Google's 'remove sites,' etc. We'd then see how the main site fares.

We would be losing a few very high page rank links (like from a major newspaper's home page) to the co-branded site... but I think in the long run that keeping the mirrored sites off the engines is important. Any way to do this and confer the link boost to the main domain?