homepage Welcome to WebmasterWorld Guest from 54.234.2.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google penalty questions.
Mostly concerning duplicate, dynamic, and php pages
Finger




msg:60920
 6:10 pm on Sep 19, 2003 (gmt 0)

Hi,

I'm pretty new to the SEO game and I have some questions I've been having some trouble finding the answers to.

1) If you have two completely different domains with identical index.html's that do not link to each other, and googlebot finds both of them through links from other domains, does Google have some way of detecting identical pages like this?

2) Does the .php extention give any penalty? I've heard it doesn't, but I changed my www.domain/index.html to index.php after it was already indexed and now my page seems to have vanished from Google (by vanished I mean that I cant find my site even when I search for a string unique to my site). However, besides changing the extention (and adding a simple counter and tracking system) I did change the site quite a bit - could this have caused that change in my Google listing (perhaps by being considered a dynamic web page)?

3) How different does a site have to be so that its not considered duplicate? For example, if another common name for widgets is thingamajigs, and you have one page optimized for "widgets" keywords, and then you create another page optimized for "thingamajigs" keywords by simply replacing "widgets" with "thingamajigs", will Google determine they are duplicate pages? If so, would changing the format of the pages counteract this, or does Google rely mainly on content text in dermining this?

4) Do sites get de-listed from Google often? If so, for what kinds of offenses?

Thanks!

 

naturalinstinct




msg:60921
 12:51 am on Sep 20, 2003 (gmt 0)

i'd also be very interested in this if anyone has any answers?

Arnett




msg:60922
 1:45 am on Sep 20, 2003 (gmt 0)

I changed my www.domain/index.html to index.php after it was already indexed and now my page seems to have vanished from Google (by vanished I mean that I cant find my site even when I search for a string unique to my site).

OK

1. What url was indexed before? [widgets.com,...] [widgets.com,...] [widgets.com...] and [widgets.com...] are all different urls.

Put all of your original files back and resubmit them. Add the line "RedirectPermanent /index.html [widgets.com...] to your .htaccess file. This is a permanent redirect and should cause Google to index the new url. Add a line for each page you have replaced with a php file. Some search engines will continue to ask for a dead url for months after it is gone. Leave the original files in place until all engines have indexed the new files. You will have two listings for each file but the permanent redirect should cause the original url to be deleted from the index. Even if they are not deleted right away or at all in some engines the content of the pages is not duplicate and shouldn't draw a penalty.

2. Do you have a 404 handler in place?

You want to add the line "ErrorDocument 404 / 404.html" to your .htaccess file. This will cause all not found errors to go to the specified file and should tell the search engines to delete the url automatically.

Arnett




msg:60923
 1:48 am on Sep 20, 2003 (gmt 0)

How different does a site have to be so that its not considered duplicate?

A duplicate file is an exact copy of another file.

soccer_star




msg:60924
 2:00 am on Sep 20, 2003 (gmt 0)

Going by my own experiences with Google, I would answer as follows:

1) If you have two completely different domains with identical index.html's that do not link to each other, and googlebot finds both of them through links from other domains, does Google have some way of detecting identical pages like this?


Yes. Don't do it, Googlebot will spot it eventually and penalise you. I have a similar page on each of my sites and disallow Googlebot from all except one of them to eliminate the risk of a duplicate penalty.

2) Does the .php extention give any penalty? I've heard it doesn't, but I changed my www.domain/index.html to index.php after it was already indexed and now my page seems to have vanished from Google (by vanished I mean that I cant find my site even when I search for a string unique to my site). However, besides changing the extention (and adding a simple counter and tracking system) I did change the site quite a bit - could this have caused that change in my Google listing (perhaps by being considered a dynamic web page)?

Although I'm no expert on php pages I don't think they carry any penalty at all. From what I've read it's sometimes harder for Googlebot to follow links than from a static html page but I doubt that's the reason your page has disappeared. If it's a fairly new site (less than 8 weeks old) it's quite common for it to keep disappearing and reappearing again.

3) How different does a site have to be so that its not considered duplicate?

Ahhh, the 64 million dollar question. :) Nobody really knows. I've read figures of at least a 20% difference and for the linking structure to be slightly different etc. Basically, you are usually safe if there is a logical reason for your pages to be similar, as opposed to them being similar just to target similar keyphrases.

For example, if you are selling 'apples' on apple.htm you can say how good fruit is for you, how cheap an apple is, how tasty apples are etc. Then if you are selling 'pears' on pears.htm it is still valid to talk about how good fruit is for you, how cheap a pear is, how tasty pears are etc.

But if you create another page for 'juicy apples' on juicy-apples.htm just to target the keyphrase 'juicy apples' and end up with a very similar page to 'apples', you will be in trouble because there is no real need for the 'juicy apples' page as you've already covered the virtues of apples on your 'apples' page. That's when you start getting into dangerous territory because even though Googlebot may not pick up on it, a competitor may report you and a Google employee may decide to penalise you.

If another common name for widgets is thingamajigs, and you have one page optimized for "widgets" keywords, and then you create another page optimized for "thingamajigs" keywords by simply replacing "widgets" with "thingamajigs", will Google determine they are duplicate pages?

I did exactly that with two different sites. Even though I rearranged the pages so they were cosmetically different with different colors, fonts and altered the link structure slightly, it was essentially the same site with the same content and I must confess I did it purely to harvest extra keyphrases.

Google kicked the duplicate site out of the index within two months.

Lesson learnt, I now realise it is more prudent to play by the rules and err on the side of caution if you're in this for the long haul.

4) Do sites get de-listed from Google often? If so, for what kinds of offenses?

All the time - this post is already too long to even scratch the surface of what types of penalties there are. There are loads of techniques people use to try and hoodwink Google (search this forum for cloaking, crosslinking, duplicate sites/pages, hidden text and redirects for starters).

The golden rule is to question whether you are doing something for the good of the surfer or for the good of your rankings. It's possible to achieve both (heck, that's what good SEO is!) but if you're ignoring the surfer's needs and doing something just to boost your PR or ranking, you run the risk of a penalty.

Arnett




msg:60925
 2:03 am on Sep 20, 2003 (gmt 0)

Do sites get de-listed from Google often? If so, for what kinds of offenses?

People report search engine spammers to the search engine. They build software to catch spammers and ban them. Check Google's webmaster guidelines at [g**gle.com...]

Don't spam Google. They have some of the most sophisticated spam filters in existence. You get caught,you get banned. You fix the problem and email webmaster@google.com with "Reinclusion Request" as the subject of your email message. If you're ok they reinclude your site.

I cant find my site even when I search for a string unique to my site)

You can find out how many pages are listed in Google by searching site:widgets.com widgets. All the neat google commands are listed at [g**gle.com...]

You want to learn about Google? Start here -> [g**gle.com...]

(Don't post actual urls here. It's frowned upon. Use "widgets.com" for your example urls and search examples like "blue widgets").

SEO is just like driving a car. Learn how to do it and then obey the rules of the road and you'll be ok. "The most dangerous mile is the mile ahead."

willybfriendly




msg:60926
 2:45 am on Sep 20, 2003 (gmt 0)

How different does a site have to be so that its not considered duplicate?

Wish I knew...

Easy enough to set a filter that checks the total size of a page as a first step towards identifying duplicate content. A checksum routine would add some robustness to a quick check like this. (I have no doubt that Google is much more sophisticated.)

I do have a number of dynamic pages that vary primarily by the geographic location named. In order to avoid the above easy checks I have made sure to include the geographic location numerous times in the pages. This strategy has given me dozens of top ten (many #1) SERPS on two and three word search terms in my little niche.

WBF

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved