homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 70 message thread spans 3 pages: 70 ( [1] 2 3 > >     
New Kind of Penalty?
SlyOldDog




msg:78146
 1:20 pm on Feb 23, 2004 (gmt 0)

We've been hit by some weird new penalty. Or to put it more accurately - a leveling of the playing field.

We always maintained at least 2 domains for each subject matter we were interested in to guard against hard times when Google might remove a site from good SERPs, accidentally or on purpose. We never considered this spamming - just insurance.

Last night I noticed that Google seems to have identified our whole network of sites. We don't have a ban, but on most searches now only one site will show up in the top 50.

The sites aren't cross linked in any identifiable way. So I think they have some way of aggregating all links between the sites and determining which sites are most likely to be connected.

Anyone else seen this?

 

martinibuster




msg:78147
 7:36 am on Feb 24, 2004 (gmt 0)

Two domains pointing to one site, or two domains pointing to two site with different content?

Pricey




msg:78148
 10:36 am on Feb 24, 2004 (gmt 0)

My company name has an (&) in it, which is why I created 2 domain names one with the & replaced with a - and one with the word "and". I found this helped clients find my site eaiser by remembering the name. I've never had any penalties though...

SlyOldDog




msg:78149
 10:47 am on Feb 24, 2004 (gmt 0)

That would be 2 domains with different content on the same topic ;)

Seriously - it's all unique but it's about the same thing. So there is no way an algorithm picked up similarity in the content.

It is either a human check (unlikely) or some way out algorithm for analyising link patterns which must have made a probablitiy guess that the sites were connected.

Bear in mind the different topics are partially linked to each other too, so Google may have hypothisised the whole network was connected.

kaled




msg:78150
 10:54 am on Feb 24, 2004 (gmt 0)

What about a WHOIS search - would that have identified the sites as linked.

If not that, perhaps your insurance has just paid off.

Kaled.

mcavill




msg:78151
 10:56 am on Feb 24, 2004 (gmt 0)

>>Anyone else seen this?

I have a similar situation, I have one large site covering all types of widgeting, and one smaller site just on one type of widget - like you different, more specific content, but on the same theme.

Up until about 2 days ago the large site sub page would be about #9 for my keyword phase, then 2 days ago the smaller site popped up at #9 and the larger site subpage is no where to be seen....the sub page did link to the smaller site.

not sure if it's coincidence, or something new that google's checking.

caveman




msg:78152
 1:24 pm on Feb 24, 2004 (gmt 0)

There have been threads speculating on various points of concern, including:

Common components in WHOIS (same owner, phone number, address, etc)

Hosted at same host on same C block with essentially the same content

Code fingerprinting
-- same authoring tool will repeat certain code patterns;
-- author has a unique style;
-- some 'cutting and pasting' between sites to save time

Google Toolbar info

GG going through your trashcan

If you do a site search here and type in 'penalty kw' - e.g., 'penalty WHOIS' - you'll probably find some of them.

P.S. I did see GG walking down my street on "trash night" not long ago, but I didn't actually see him in the can. ;-)

SlyOldDog




msg:78153
 2:47 pm on Feb 24, 2004 (gmt 0)

The only one it could be of the above is the Toolbar.

How about the Related Sites feature?

That one is frightening.

caveman




msg:78154
 3:57 pm on Feb 24, 2004 (gmt 0)

I'm not saying that they do use the toolbar for that purpose, but certainly G has the ability to attach your IP to every site you visit, and to know how often you visit each site/page, etc.

If you use the Web to access your own sites' pages often, as many do...well...it wouldn't be hard to figure out what sites you may have a relationship with, especially if you kept landing on pages that had not been spidered yet.

I uninstalled all toolbars several months ago. I don't think they'd see anything with our sites that I would be concerned about. I just don't like that level of, um, information sharing...

Put another way, when I unzip my pants, I like to have some control over who's watching. ;-)

More Traffic Please




msg:78155
 4:23 pm on Feb 24, 2004 (gmt 0)

I'm not saying that they do use the toolbar for that purpose, but certainly G has the ability to attach your IP to every site you visit, and to know how often you visit each site/page, etc.
If you use the Web to access your own sites' pages often, as many do...well...it wouldn't be hard to figure out what sites you may have a relationship with, especially if you kept landing on pages that had not been spidered yet.

I guess it's possible, but I visit this site more times a day than my own site. I'm sure this scenario is true for tons of site owners. Google would have to monitor all the visiting patterns of every ToolBar user and look for patterns that seemed suspicious and then verify a connection between your browser and the site in question etc. It just seems a bit over the top. In addition, if you are on a dial up, your IP changes all the time. If there is anything to this, I'm guessing it has more to do with WHOIS info.

SlyOldDog




msg:78156
 4:28 pm on Feb 24, 2004 (gmt 0)

In our case it isn't whois. Each site is in a different name and on a different ISP with different DNS.

It might be a coincidence that 2 of our sites tripped the OOP filter (or as GoogleGuy would say - "the new algo") on the same day, but since now, 2 sites hardly ever appear in the same place at the same time (like Clark Kent and Superman) a Related Site penalty seems more likely.

jbgilbert




msg:78157
 4:38 pm on Feb 24, 2004 (gmt 0)

slyolddog,

Yes... seeing similar issues under almost the identical scenario you have mentioned. In my case I believe it to be worse because the 2 sites are on the same subject, very similar in every aspect, but offer different products?

The "related" thing bothers me too. It's very inaccurate and I sincerely hope Google does not give it much weight.

glengara




msg:78158
 4:41 pm on Feb 24, 2004 (gmt 0)

Must have missed that class, what's a Related Site Penalty?

caveman




msg:78159
 4:55 pm on Feb 24, 2004 (gmt 0)

Google would have to monitor all the visiting patterns of every ToolBar user and look for patterns that seemed suspicious and then verify a connection between your browser and the site in question etc.

Nope. All they'd need to do is look for apparent connections between frequently visited sites from a given IP.

As an example, while G would not see connections between your blue widgets site and WW, they might see connections between your blue widgets site and your green widgets site. ;-)

Again, not saying that they do this, only that they easily could. If they did, I'd think that they would need at least one other verifiable connection between two sites, like WHOIS. Otherwise, someone who frequently checked his site and his competitor's site could get the competitor in a heap o' trouble. :-o

rfgdxm1




msg:78160
 5:12 pm on Feb 24, 2004 (gmt 0)

>Otherwise, someone who frequently checked his site and his competitor's site could get the competitor in a heap o' trouble.

Or say someone who regularly read the WebmasterWorld forums, and the forums on another site. I can see this particularly being a concern in cases of sites with a topic where there are only 2 sites with forums. I can also see bloggers visiting their own blog, and the blogs of a couple other people they find interesting.

There are also some good reasons for having more than one site on a related topic. I have 2 amateur sites. One is about widgets and widget safety in general, and the other is devoted to one very specific brand that contains widgets. This brand has the specific property it is only available in one country on Earth: the US. Needless to say, people in every other country in the world would have little or no interest in the content of that specific widget brand, so it makes sense that it be on a different site.

cabbie




msg:78161
 6:20 pm on Feb 24, 2004 (gmt 0)

I have seen this too.
I suspected that it was "code fingerprinting" or using the similar pages feature.

caveman




msg:78162
 6:30 pm on Feb 24, 2004 (gmt 0)

so it makes sense that it be on a different site

Right.

My opinion, FWIW, is that this toolbar thing *might* be happening. I heard enough annecdotal evidence to suggest at least the possibility - like the comments above.

Certainly G has the ability. Whether it's practical or not is way beyond my expertise. Clearly, they'd have to be careful, per the blogger examples above.

But, what if they did see lots of activity between two blogger sites from one IP, and the sites were closely related if not near identical in the sorts of links they offer? And what if there were snippets of code that were near identical, strongly suggesting a single author. It's a touchy area. 12 months ago I would have laughed at anyone saying what I'm saying now. Since then, I've seen too many sites appear in G that are not live yet (toolbar most likely); too many sites disappear as in the examples above; too many comments re WHOIS, etc.

Plus it's clear to me that they've been trying to tighten up on dup content for the past 6-12 months.

I don't know, and in my case it's not much of an issue, but for some it could be a huge issue, i.e., those that are running lots of dup/spammy sites to catch different kw's with the same content.

Me, I just don't like them seeing pages before I launch sites. And I don't like companies watching behavior that in my view is personal.

rfgdxm1




msg:78163
 6:42 pm on Feb 24, 2004 (gmt 0)

>My opinion, FWIW, is that this toolbar thing *might* be happening.

I'd suspect more sites on similar IPs, and/or common whois data. We know Google has the whois data, because GG said they did, and use it to spot when domain names expire accidentally, and the owner quickly pays up. By necessity they have to know the IPs. The false hits on toolbar data would be very high. There is a natural tendency for surfers to frequent sites on similar topics. However, they could use the toolbar data to trigger an algo element that says "Hmm...looks suspicious. Time to dig deeper."

caveman




msg:78164
 6:50 pm on Feb 24, 2004 (gmt 0)

However, they could use the toolbar data to trigger an algo element that says "Hmm...looks suspicious. Time to dig deeper."

Exactly, if it's happening, that would probably be the extent of it. Still, bothers me a bit even if it's just a trigger...

SlyOldDog, you sure there's no other similarity between sites/records that could be causing this?

Perhaps with the presence of the toolbar info, the hurdle of proof might get lowered a bit too?

SlyOldDog




msg:78165
 6:57 pm on Feb 24, 2004 (gmt 0)

Of course a human check would confirm it. The other thing associating the sites is the links between them. We don't link much between them, but what links there are might be enough to form an association because they are at higher level pages. Other outgoing links (to sites not belonging to us) are lower levels and Google might just exclude them from their sample.

Interesting thing is that the same old tricks restore the sites to their original positions:

keyword1 keyword2 +www

MedCenter




msg:78166
 7:08 pm on Feb 24, 2004 (gmt 0)

Seems like most people havn't read any of the Google papers, or LSI information.

It is quite easy to find a pair of mirrored hosts on the web. IP addresses, WHOIS, and such is just a small fraction of how it can be done.

The more advanced method is defining a host by the uniqueness of its outbound links, and its hierarchal directory structure.

basically, if two hosts on the web have the exact same outbound links, or at least 80% in common, and their insite directory structure is 80% similar, then it is most probably a mirrored set.

this is just ONE of thousands of factors.

remember that LSI removes all common and "cheap" english words, such as "and" "&" "therefore" "so" "for" etc.

With that in mind, rearrange your text and design all you want, insert billions of pronouns and interjections, and the G technology will still just as easily know your site is identical.

how to get around this?

1. different IP address
2. different LSI terms / keywords [throw in a good amount of different unique words to each site/page, both in Title, Meta, and Page Content
3. differ your site structure, especially directories and filenames. maybe even alter the sitemap.
4. have at least a 40% difference in your outbound links for each site.

do the above, and your site will be back. [provided all other things remain equal]

:)

Krapulator




msg:78167
 7:56 pm on Feb 24, 2004 (gmt 0)

This could also be an application of LocalRank.

CCowboy




msg:78168
 8:32 pm on Feb 24, 2004 (gmt 0)

SlyOldDog,

My guess it is not a penalty. I think it has something to do with a "less than deep crawl". With all these news pages added it just takes more or less of "what ever" to rise to the top.

I have noticed that my sites that have deep drops in postion, rise again after a fresh crawl showing a fresh date.

just my 2 cents

a_chameleon




msg:78169
 8:41 pm on Feb 24, 2004 (gmt 0)

I'm in the same boat, SlyoldDog.

Two sites, one far more sophisticated than the other, bigger, more/different everything... except;
The common denominators of the two and a prerequisite (from the CEO) is:

1) Same images used on menu, same color scheme.
2) Same product names/part no.'s.

Granted, I've not re-named the images, nor the product ID's themelves, as I'm an officer of the firm and have a "day job" selling their ptoducts to their marketplace.

From what I'm reading, you don't have that denominator... do you?

More Traffic Please




msg:78170
 8:47 pm on Feb 24, 2004 (gmt 0)

The more advanced method is defining a host by the uniqueness of its outbound links, and its hierarchal directory structure.

basically, if two hosts on the web have the exact same outbound links, or at least 80% in common, and their insite directory structure is 80% similar, then it is most probably a mirrored set.

this is just ONE of thousands of factors.

remember that LSI removes all common and "cheap" english words, such as "and" "&" "therefore" "so" "for" etc.

With that in mind, rearrange your text and design all you want, insert billions of pronouns and interjections, and the G technology will still just as easily know your site is identical.

Here is an example of why Google would have to be very careful with this approach.

There is a large web design firm that makes template sites for Realtors and hosts the sites themselves with the same class C IP's. I'm not sure how many sites you can host on one class C IP with virtual hosting etc., but you get the picture. They have created over 24,000 sites since the mid 1990's. The site structures are nearly identical. Many agents target the same communities, so from a LSI standpoint, the sites would seem very similar. The index page of many of these sites come with identical external links to information on taxes, decorating etc. The reciprical linking structure among real estate sites tends to be very tight.

Does Google have a way of identifying this scenario and separating these sites from true mirror sites? Or, could this be why so many real estate sites have been hammered over the last few months? I would guess that at least 90% of Realtors sites are of a template nature.

MedCenter




msg:78171
 8:53 pm on Feb 24, 2004 (gmt 0)


Does Google have a way of identifying this scenario and separating these sites from true mirror sites? Or, could this be why so many real estate sites have been hammered over the last few months? I would guess that at least 90% of Realtors sites are of a template nature.

I think you have nailed somethere there MTP.

FYI, there is no technicaly limit to the amount of domains on one IP address. Unless there are government ICAAN regulations in place, I do not know.

However with LSI, most of these sites would have different brand names interspersed within the sites. That would allow them to be more unique within the LSI factor.

Otherwise though, regarding the similarity in outbound and inbound links, there is no workaround IMHO.

Think of it this way. If all these sites link the same place, and the same places link to them, then I as the searcher will be rewarded the same no matter where I land. Google is right in thinking this, because I can get to all the same places, from any of these entry points.

It sucks sometimes, but it's just a good reason to break away from template design and vary your linking scheme and hierarchy.

rfgdxm1




msg:78172
 9:00 pm on Feb 24, 2004 (gmt 0)

>It sucks sometimes, but it's just a good reason to break away from template design and vary your linking scheme and hierarchy.

And, if this is a large web design firm that has created *tens of thousands* of Realtor site, should perhaps they have hired some good SEOs who could point out the potential problems here? Basically, the content here is very similar, which is something that is well known that SEs don't like.

SlyOldDog




msg:78173
 12:12 am on Feb 25, 2004 (gmt 0)

a-chameleon

>>selling their ptoducts to their marketplace.

>>From what I'm reading, you don't have that denominator... do you?

Yes we do. We sell products which we give unique names. For example "red 52". These names appear on both sites. This might be the problem if LSI is the culprit. Our internal pages on both sites would contain a high percentage of rarely co-inciding words, and this would easily allow Google to tie the sites together (using an LSI vector), and perhaps even drop one simply because the two are too similar to be shown together at the top of SERPs.

MedCenter - I'm not sure if your remedy would help. Many sites ranked alongside us use very similar terminology to us (except for the product names), so I doubt changing our core keywords would make a big difference.

We also have a links directory on our sites, and whilst it is different on each site, the structure is the same and the usual suspects end on adding themselves to it, so it may be we have a large percentage of identical outgoing links on many of the sites.

Finally a question for LSI experts. Does LSI apply to a page or a site? Because our home pages do not contain any duplicate content. I am sure of that.

div01




msg:78174
 12:33 am on Feb 25, 2004 (gmt 0)

WHOIS can get tricky because a lot of registrars are offering proxy registering as a value added service. This would mean a whole bunch of domains registered to the same entity.

kaled




msg:78175
 1:44 am on Feb 25, 2004 (gmt 0)

With all this dicussion about Google recognising (or imagining) duplicate sites I decided to to a quick search that used to place three identical sites in the top 5 serps. They are now #3,19,20 (The sites seem to be generated dynamically from a single database. Colors and graphics vary but text is mostly identical. There were small differences on one site.)

The WHOIS info is almost identical too.

If Google are filtering out duplicate sites the filter hasn't caught up with these yet.

Kaled.

PS
SlyOldDog, What happens if you repeat the search with similar pages included (assuming the option is given)?

This 70 message thread spans 3 pages: 70 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved