Forum Moderators: open

Message Too Old, No Replies

doorway variety

         

rayjam

3:10 pm on Dec 11, 2000 (gmt 0)

10+ Year Member



I have to create several dozen pages all quite similar. Does anyone know what percentage of file size will differentiate one page from another in the eyes of a spider? How closely will spiders look at the text contained in similar pages. Do they compare text body to text body to decide if a page is spam? or position of kw's or phrases
Thanks

Air

4:22 am on Dec 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>How closely will spiders look at the text contained in similar pages.

Very closely, maybe not immediately, but sooner or later one or both of the equal pages will get dumped.

If you have many pages, especially from the same domain, with similar wording and structure and keyword placement they will likely get flagged as machine generated (templates). Mostly it is AltaVista you need to be concerned about in this regard, but most of the majors are sensitive to exact or near exact duplicates.

Vary the page length, and page titles, don't repeat big chunks of text from one page to another, and you should be fine on all the engines.

rayjam

11:42 am on Dec 12, 2000 (gmt 0)

10+ Year Member



Thanks AIR

msgraph

1:45 pm on Dec 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've had duplicate pages picked up on all the crawling search engines. Some were listed right next to each other and some were listed one month on one domain and another month on another domain.

Like Air said, Altavista is very very good at tracking down duplicates. I think they just won a patent on the process they use, so it is sure to work if they can patent it. But they don't seem to notice it right away. It usually takes a few weeks after you have been listed.

I would suggest re-wording your titles, descriptions, and link text. Again, like Air recommended, change some of your body text as well. That way you don't have block of pages all within a range of X kb.

Machiavelli

2:38 pm on Dec 12, 2000 (gmt 0)



msgraph - have you any more information on this altavista patent - I can't find any information anywhere.

msgraph

2:48 pm on Dec 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here you go Mach, I pulled it out of their press release section.

Altavista Wins Patents [doc.altavista.com]

Machiavelli

2:53 pm on Dec 12, 2000 (gmt 0)



I pulled it out of their press release section

<blush>I didn't do a very good job of looking</blush>

PeteU

12:32 am on Dec 13, 2000 (gmt 0)

10+ Year Member



Its all here [164.195.100.11]
it is prety interesting that they don't look at pages source as it would be too resource heavy instead the sole/major factor for finding duplicate pages are outgoing links on them.
Have fun with those patents guys ;)

pete

8:09 am on Dec 13, 2000 (gmt 0)

10+ Year Member



Thanks MS and Pete

Its an obvious mistake Ive been making. I have been creating pages which are completely different in regard to content, page size, densities e.t.c

Often, probably the only constant between these pages is the outgoing links. Really good to know and provides a possible explanation for why some of my pages on Alta have been dropped after being in the results for about 2 - 4 weeks

tedster

11:57 am on Dec 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why on earth does this deserve a patent? And how will they know if someone violates it? To me this seems nuttier than Amazon's patent on "one click" technology. If Newton was alive today he could probably patent the calculus!

But seriously -- I'm assuming that "outgoing links" must mean links to another domain, right? Otherwise, it seems to me, that pages from one website with standard menu navigation would all look like duplicates to this method.

Machiavelli

2:09 pm on Dec 13, 2000 (gmt 0)



I agree with you tedster about the menu navigation - that is, I would agree with you if I thought AV sane. It is a rather queer method of calling pages similar altogether. Pages could have paragraphs & paragraphs of different text, but both link to, say, the four most important documents in their field, and, SMACK, they are seen as similar.

msgraph

2:20 pm on Dec 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good points Ted and Mach.

I wonder if this will conflict with link popularity. I guess I now have to go through the process of changing doorway pages and various root index pages with links to my main sites.

msgraph

2:29 pm on Dec 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"...divided by the total number of outgoing links is larger than a predetermined threshold."

Does this mean that they have a predetermined allowance for duplicate links? Like there can be no more than x amount of duplicate links?

or

Does this mean that if there are more duplicate links than non-duplicate links then they will consider it a duplicate page?

Machiavelli

4:58 pm on Dec 13, 2000 (gmt 0)



More like the second answer of yours, msgraph. If page A has 20 links on it, and page B has 15 links on it, and 6 of the links on page B appear on page A, then the union of the links is (20 + 15 - 6) = 29 ; so the ratio is 6/29, or slightly over 0.2

On the other hand, if page C has 5 links and page D has 6 links, and all of the links on C also appear on D, then the union is 6, and the ratio is 5/6, or 0.87

The value of this ratio is always between 0 and 1 inclusive; what we need to figure out is this: what is the critical value? I would guess about 0.7 .

tedster

10:54 pm on Dec 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's also the issue of how perceived duplicates are treated. I think the answer is "nuke 'em all".

I know that this method was created especially to eliminate doorway pages on different domains. But it also hurts when different domain names point to the same set of pages. This I know from experience. Got to keep duplicate paths to the same page away from AV, or they all may get dropped.

However, doesn't this patented method also open the door to an unethical form of competition? Seems to me that a low-life competitor doesn't need to plagiarize, just duplicate your links on a very different page. Submit to AV, and your production page would be penalized along with his scam page.

Nah, they must have though about that, right?

2_much

11:17 pm on Dec 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm probably missing something but this doesn't make sense to me. If a site is copied over and "pasted" on a new domain name, all of the outgoing links to the "new" site's tour pages will be different. This can be done by both the webmaster and a "page jacker". I can see how this would be effective when dealing with doorpages, but what about anything that's copied over that has several links to pages within the same domain????
Maybe this is where HTML structure kicks in?
So, to be safe, it would be ideal to alter both a little bit...If the outgoing links are not the same because they are on a different domain, then the HTML code should be changed a little bit (title, text, file size, meta tags, etc) to ensure they're not "nuked".

tedster

3:15 am on Dec 14, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My (somewhat bitter) experience with Alta Vista duplicate screening is that they do nail similar pages on different domains -- in fact this is their essential purpose.

The documentation for their patent makes it clear that they are not looking at the page copy, only at the links themselves. I also think it's only the destinations that get compared, not the link text, and it's "outbound" links that matter, not in-site navigation.

Whatever the exact method, the bottom line effect I have seen is that all duplicates or near-duplicates get dropped, especially if they are on different domains. I can understand not wanting to index all of the duplicate pages. I just wish that at least one of the set would stick. This method has hit several of my clients who mirror their site on more than one domain name, even though the dropped pages were in no way doorways.

My solution has been to re-submit the pages, being careful to use just one domain. Then I addressed the incoming links, asking webmasters to link only to that one domain. Most are very cooperative, and they appreciate the information about AV as well.

My client's dropped pages are now back in the index and beginning what I hope is a climb through the ranks.

2_much

5:15 pm on Dec 14, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That makes a lot of sense Tedster. So links within the same domain aren't considered outbound, right?
One more question - what if the site has no outbound links? Do the "dupes" then go unnoticed because they would probably rank very lowly anyways? Or is AV still able to detect them?
Wow..this is pretty staggering...I finally discovered why we have never ranked well in AV! That was driving me nuts...it was the one engine we hadn't been able to break into...Our linkage plan included adding the same set of relevant links to each site that belongs to the same category...dooming ourselves with AV!
Thank you everyone for sharing this information, really appreciate it! It's back to the drawing board for me :)

Bates

8:44 am on Dec 20, 2000 (gmt 0)

10+ Year Member



This is gonna be a major problem..

As Machiavelli said,"Pages could have paragraphs & paragraphs of different text, but both link to, say, the four most important documents in their field, and, SMACK, they are seen as similar".

There goes our #1 and #2 positions!

JamesR

5:22 pm on Dec 20, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Why on earth does this deserve a patent? And how will they know if someone violates it?

Kind of ironic since one of the filers for the patent, Monika Henzinger, now works for Google and wrote of very similar technology there (although doesn't seem to be in use yet)

wayne_c

4:16 pm on Jan 7, 2001 (gmt 0)



Hello , question ....in regard to what AV sees as an outgoing link , is there a difference between linking just the domain like wwww.mydomain.com and having links to the different pages on mydomain.com , do you think that this would be an effective way to make sure the doorways do not get dropped

thanks

wayne

Air

7:45 pm on Jan 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Wayne,

Generally speaking pages that have no links pointing at them (internal or external) tend to get flagged as doorway pages. So it is a good idea to have links on optimized pages and have other pages point at the optimized pages from your site.

The thing that seems to be an issue with AV is that if two pages have the exact same outgoing links (from what I can see these are links to external sites) then they get big points as potential duplicate pages.

Garyh

8:26 pm on Jan 10, 2001 (gmt 0)



O.K., new at this and a bit confused. I was talking to a friend about having a "friends" page (links). This page would of course have external links to other site that would reciprocate. The conversation had more to do with categories than duplicates, BUT, if you have 5 separate domains each linking to each other, will that result in "duplicate" pages?

One site is going to have at least 4 links to the same page, most likely the index page.

Then add hyperlinking to internal pages to this mix and I feel there is going to be a real mess here. Then there are the doorway pages with additional links!

Am I off the track here or what? Can anyone clearly state the rules as you understand them.

Thanks Gary

2_much

11:36 pm on Jan 10, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's see if I'm understanding you correctly Gary...Let's say you have 5 sites that are all going to have a "friends" page...
Site A, Site B, Site C, Site D, Site E
So A would have links to B, C, D and E on it's friends page, as well as links to pages within Site A.
B would have links to A, C, D and E, with links to pages within B...etc.
If you own site A...a suggestion would be for you to add a few outgoing links that B, C, D and E did not include...maybe to directory pages where A is listed?? or just a few other random links that your "friends" don't have.
I haven't tried this, but this is going to be one of my solutions to the issues discussed in this thread.
Does anyone know if this would be an effective solution??
Also, something else that we are going to start testing, is to create 3 page doorpages. We're going to create many ontopic doorpages, with the same keywords, with their own domains. Then we are going to make the 3 page sites by using each one as a tour page for another site. Then we are going to add links to the tour pages on the index pages with hidden links.
We are only in the process of establishing this system so I'm not sure how effective this is. So, just a suggestion. This would also help to vary the outgoing and internal links.

Garyh

2:49 am on Jan 11, 2001 (gmt 0)



Thanks 2_much,

< So A would have links to B, C, D and E on it's friends page, as well as links to pages within Site A. B would have links to A, C, D and E, with links to pages within B...etc. >

Yes, that is correct. Even if you add additional links that the other sites have you still have the "double" page problem I have surmised. Still don't know if that is a problem and I agree that each site having additional different URL's listed is a good idea.

Here is something that I just now wondered about! Do the SE's look at the IP address's? All of our "friends" are on the same server, therefore same IP address. If these outgoing links were *really* friends on the same server or even with close proximity IP addresses, that could cause a problem if the SE's look for this. I can get around it as I am sure most of you could too, * BUT *, is this a problem?

How about "contact us" similar addresses and phone numbers? Maybe I am a bit paranoid, can anyone expound on this?

Gary