bogus duplicate site-how to warn google? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

bogus duplicate site-how to warn google?

An intern developed a version of the site . still online

Fearless

2:53 pm on Aug 23, 2006 (gmt 0)

10+ Year Member

I come into this from a very different angle from most people here.

I develop sites for not-for-profits and political campaigns and organizations.

I was brought in to take over a site for a statewide caucus.

Several years ago, the caucus had a college student develop the site as a technology demonstration for his school work. He's graduated and moved on. (It's difficult to describe the problem without being able to use the actual terms or URLs)

I have created an entirely new site and it has an excellent google rank and we are doing just fine thank you.

However, this young man has kept a complete copy of his version of the site live on his server space as a demonstration of his skills as a web designer. He has promised to use "no follow" tags and such, but has failed to do so.

So if you type in the obvious search terms, the real version of the site is the first result, but his bogus copy shows up next.

To the uninformed web surfer, it's confusing.

Is there anything that I can do to get Google to quit returning his (bogus) copy of the site in their SERPs?

(It's outdated, inaccurate, etc)

Adam_Lasnik

8:32 pm on Aug 23, 2006 (gmt 0)

10+ Year Member

nofollow isn't the right tool in this context.
Instead, you want to respectfully insist that he add the area of his site at issue into his robots.txt file so that neither Google nor any other major engine will visit those pages.

There are legal issues here which I cannot comment upon both because I don't know the full details of the situation, nor am I a lawyer :-). But with that said, you may wish to determine whether the DMCA applies here:
[google.com...]

twebdonny

9:07 pm on Aug 23, 2006 (gmt 0)

>>>add the area of his site at issue into his robots.txt file<<<<

Google in more instances than not, totally ignores robots.txt, as well as, htaccess files, and spiders the duped content anyway, and typically penalizes the original
contenet developer.

g1smd

9:45 pm on Aug 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Google does not usually totally ignore "robots.txt"; what happens is that URLs that exist are shown as URL-only entries in the SERPs. The content is not spidered and is not indexed, but the fact the page simply exists does show in the SERPs. To get the URL out of the SERPs the page needs a meta robots noindex tag instead of a matching entry in the robots.txt file.

Google cannot ignore stuff in the .htaccess file. That file controls exactly what Google actually "sees". It cannot be over-ridden.

Fearless

5:01 am on Aug 24, 2006 (gmt 0)

10+ Year Member

It's been a long tough day. Thanks for your help.

How can I explain this without resorting to URLs or the actual search terms?

OK, imagine this. Fearless (myself) runs for congress and an unpaid volunteer creates a website called "fearlessforcongress.com" Several years later, he still has a complete copy of the site posted to his server space. And if anybody does a Google search of the terms "fearless congress" my current site is number one in the SERPS but his old, outdated and incorrect copy of the site is listed second.

Of course, I have absolutely no idea how much traffic he is getting as a result.

A couple of quick notes...

1. I doubt very much if Google is penalizing us. Our pagerank is too high. Also, it is not a duplicate copy of the site. It's the site a few years ago when the student in question was an unpaid intern.

2. I tried to file a complaint with his current ISP and THAT got me an email from the young man saying that the content of the site is "his" and that I "have no right to ask him to take it down."

Now he is blocking my emails to him. how mature...

Even if he was an unpaid volunteer, the site was created for us. The problem all started way back then when he was allowed to post a "testing" version of the site on his server space.

However, at this point, he has taken a number of measures which are all positive. I can't locate a robot.txt file for the site but he has added noindex meta tags and apparently has used the google automatic exclusion tool to stop listing the content... which is great as long as NOBODY has an indexed site that is linked to his version of the site.

There are only 76 days until the election and my only hope is that Google will respond ASAP.

I don't even want to THINK about the other search engines...

Again, thanks for your help.

Quadrille

9:20 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The solution to the problem is for his site to be taken down. If he was an intern, then he does not 'own' the work he was involved with - the company does.

If, on the other hand, he was given permission to place the stuff on hos site, there's nothing you can do.

You could ask him to use robot.txt - but I guess he wants his site to be seen (or there's not much point him making it!)

However, if your's is a campaign site, and he is promoting the campaign, that could be good - right?

pmkpmk

9:24 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Use the lawyerpult...

topr8

9:59 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

ok, here's an off the wall suggestion from another angle...

he was an unpaid intern at the time and isn't being entirely co-operative, this smells to me like he has bad feeling about the time he was there, why don't you just try and resolve that issue and i think you'll find he happily removes the site altogether at your request... at least during the election period.

leadegroot

10:03 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

he has added noindex meta tags

That is probably your best bet, actually.
More powerful than the robots.txt file IME.

KenB

11:17 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Google in more instances than not, totally ignores robots.txt, as well as, htaccess files, and spiders the duped content anyway, and typically penalizes the original contenet developer.

Google is very good about obeying the robots.txt file. I have some pretty darconian bad bot traps on my site for robots that disobey my robots.txt file and Google NEVER falls into my traps.

Fearless

10:35 pm on Aug 24, 2006 (gmt 0)

10+ Year Member

topr8,

That was my first approach. Didn't work.

He did put a TON of "heavy lifting" into this and without going into too many details it was one of those "a miss is as good as a mile" situations. He seems upset that we just aren't using his project anymore. As a technolgy tour de force- it was great. I'm certain that he got an "A+"

As a real world website, OK, not great. Plus since he created everything from scratch -I have no hope of updating or maintaining it.

Plus, what he doesn't realize is that if his intention in leaving it online was to land future clients, I could have provided some AWESOME references...

[edited by: Fearless at 10:36 pm (utc) on Aug. 24, 2006]

topr8

11:46 am on Aug 25, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

>>That was my first approach. Didn't work.

ok, makes sense that you'd try it first ... some people eh? sounds like you've got a complicated situation on your hands that could so easily have worked out bewtter for everybody! good luck

Fearless

9:55 pm on Aug 25, 2006 (gmt 0)

10+ Year Member

Yep.

He has made an impassioned defense of the copyright issue...

which of course, as everyone has pointed out, is utterly specious.

He's not running for office, it's not "his" site.

Sometimes, very, very smart people are their own worst enemies.

I know because I married into a family of (literally) mad rocket scientists and nutty professors, so this all has a very familiar feel to it.

Sometimes, there is no substitute for a little (un)common sense.

goubarev

10:14 pm on Aug 25, 2006 (gmt 0)

10+ Year Member

Hold on - your site comes up first, his site comes up second - you two get the top of the SERP's - he has all the links poiting to you - and you'r complaining?

Gee, many people here will be happy to get as many sites as possible into the first 10...

He put "no index" tags - that's pretty much covers his part... I see it as somewhat unfair to him - he is helping you (with SERPS, free traffic, and tags) and you are bashing him... yep, don't sound right to me at all...

tedster

10:22 pm on Aug 25, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

He put "no index" tags - that's pretty much covers his part

I can't locate a robot.txt file

has used the google automatic exclusion tool to stop listing the content

as long as NOBODY has an indexed site that is linked to his version of the site.

The exclusion tool will not work without a robots.txt in place as far as I know. But if there is a robots.txt and he (or you) then submits a removal request, the urls will be removed for 6 months, and that removal happens pretty fast -- backlinks will not enter into it.

lammert

10:36 pm on Aug 25, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

The exclusion tool will not work without a robots.txt in place as far as I know.

If you use the "noindex" meta tag, the URL removal tool does work. You however have to enter every URL in the removal tool, one by one, which is can take some time with a larger site.

Some things still don't add up.

My experience is that when "noindex" is used, URLs are removed from the index in the next crawl. This is depending on the site within a few weeks to a month. Also Fearless mentiond that the person used the exclusion tool. Two questions about this: How does he know if he is not on speaking terms anymore with the site owner (he is blocking all Fearless' emails), and why are the URLs still in the SERPs?

There is still missing a percentage of the whole story IMO. Or maybe the "exclusion tool" is not the "URL removal tool" as I and others assume?

goubarev

10:55 pm on Aug 25, 2006 (gmt 0)

10+ Year Member

:c)
Dude, I've got a solution for you!

Just give him twenty bucks or something... and turn a bad thing into good one - tell him to foward all the traffic to your site with some redirect. This may actually boost your PR too...

And he wants so badly, he may recreate his old site somewhere else with "no index" tags... (or as suggested above in some directory which is restricted for SEs by robots.txt)

Lawyers... shmoyers... often it's easier just to pay few bucks to get it over with... saves time and everybody's happy...