Forum Moderators: open
>>Re-ranking component 122 begins by identifying the documents in the initial set that have a hyperlink to document x. (Act 301). The set of documents that have such hyperlinks are denoted as B(y). Documents from the same host as document x tend to be similar to document x but often do not provide significant new information to the user. Accordingly, re-ranking component 124 removes documents from B(y) that have the same host as document x. (Act 302). More specifically, let IP3(x) denote the first three octets of the IP (Internet Protocol) address of document x (i.e., the IP subnet). If IP3(x)=IP3(y), document y is removed from B(y).
What I'm concerned with here is:
>>re-ranking component 124 removes documents from B(y) that have the same host as document x.
Here's the scenario. Up until recently, I had a top ten spot for a so-so competitive keyword phrase.
The page in question has hundreds of inbound links. This includes several PR6, and a couple PR7 inound links. The page itself has shown good PR for a while now.
Then, a couple weeks ago, the page was the subject of some discussion on a message board. That thread was quickly picked up by Google, and ever since, that thread (with no inbound links except from the forum itself) has occupied the spot previously occupied by my page.
The original page still shows PR and backlinks, but is no longer indexed. A search for the URL itself shows:
>>Sorry, no information is available for the URL www.the-site.com/the_page.php
Attempt to get the Google cache returns:
>>Your search - cache:http://www.the-site.com/the_page.php - did not match any documents.
But a search for backlinks to that specific page still returns all the indexed backlinks.
The page has been dropped entirely, and replaced with a crummy forum thread discussing my page. The crummy forum is on my server (along with 100 or so other sites I host, the vast majority of which I do not own. they are hosting clients.)
The only reason I can think of for this page being dropped is the "re-ranking component 124 removes documents from B(y) that have the same host as document x."
The forum thread does not quote any part of the original page, it just discusses it. So I can't imagine a duplicate content penalty here.
Frankly, I'm rather baffled by it. Google losing a page with a ton of inbound links? It's odd.
I'm wondering if anybody might have some insight into this issue.
Prominent sites and companies are discussed all the time on forums, often on forums OWNED AND MANAGED by those very companies.
Many web hosting companies, for instance, host discussion boards to offer peer-to-peer support. I can't imagine "MegaHostingCorp" losing its front page PR7 to a page deep within one of its forums that (like every other page there) happens to discuss MegaHostingCorp.
Similarly, could you imagine Google penalizing Microsoft (okay, humor me here :D) because its KB support documents also mention various Microsoft products? I mean, incompatibility notices about MS Flight Simulator, for instance, would then rank higher than MS' main Flight Simulator page. That just doesn't seem logical.
I CAN imagine Google aggressively penalizing highly duplicative content, especially when posted within an IP range. Or even de-emphasizing PR benefits from intra-linking within an IP range. But penalizing mentions? It really seems too outlandish.
Documents from the same host as document x tend to be similar to document x but often do not provide significant new information to the user. [...] More specifically, let IP3(x) denote the first three octets of the IP (Internet Protocol) address of document x (i.e., the IP subnet). If IP3(x)=IP3(y), document y is removed from B(y).
Are you sure that patent is for real? Seems like, as ThatAdamGuy said, they would be dropping millions of sites if it were!
Man I sure hope it isn't real or that my puny brain has failed to grasp it or something!
Jordan
No lost PR, just not in the SERP's. The PR is intact, even as Google dropped the page from their index.
New development: I have two hosting clients. Both deal with the same product. If I do a search for Domain1, it is not there! The index page of one of my hosting clients is gone! In it's place is index page of hosting client #2, who offers the same name-brand items. No duplicate content, just same IP.
The index page of hosting client #1 is still showing PR, and still shows backlinks, but there is no cache of the page anymore, and a search for his domain returns his competitor's site.
LOL. Good to hear Google has a sense of humour.
It's choosing just one document from the IP range.
The original page still shows PR and backlinks, but is no longer indexed. A search for the URL itself shows:>>Sorry, no information is available for the URL www.the-site.com/the_page.php
Attempt to get the Google cache returns:
>>Your search - cache:http://www.the-site.com/the_page.php - did not match any documents.
I think this cleary shows that the problem is not caused by any new "re-ranking component that removes documents". If that would be the reason, the problem would just arise for the keyword phrase. Perhaps there is a simpler explanation, e.g. the page was down when Google tried to spider it.
I have a site with a number of similar pages. Google has indexed 46 different pages. However, a search for 'site:www.domain.com keyword' yields 'Results 1 - 33 of about 46' even with 'filter=0'. A search for 'site:www.domain.de keyword pageXY.htm' brings up each page (even those which were not shown for the first search).
By the way, recently there was a discussion about a new Google patent [webmasterworld.com].