Forum Moderators: open

Message Too Old, No Replies

IP filter?

         

mosley700

7:41 pm on Sep 2, 2003 (gmt 0)

10+ Year Member



From Google recent patent:

>>Re-ranking component 122 begins by identifying the documents in the initial set that have a hyperlink to document x. (Act 301). The set of documents that have such hyperlinks are denoted as B(y). Documents from the same host as document x tend to be similar to document x but often do not provide significant new information to the user. Accordingly, re-ranking component 124 removes documents from B(y) that have the same host as document x. (Act 302). More specifically, let IP3(x) denote the first three octets of the IP (Internet Protocol) address of document x (i.e., the IP subnet). If IP3(x)=IP3(y), document y is removed from B(y).

What I'm concerned with here is:

>>re-ranking component 124 removes documents from B(y) that have the same host as document x.

Here's the scenario. Up until recently, I had a top ten spot for a so-so competitive keyword phrase.

The page in question has hundreds of inbound links. This includes several PR6, and a couple PR7 inound links. The page itself has shown good PR for a while now.

Then, a couple weeks ago, the page was the subject of some discussion on a message board. That thread was quickly picked up by Google, and ever since, that thread (with no inbound links except from the forum itself) has occupied the spot previously occupied by my page.

The original page still shows PR and backlinks, but is no longer indexed. A search for the URL itself shows:

>>Sorry, no information is available for the URL www.the-site.com/the_page.php

Attempt to get the Google cache returns:

>>Your search - cache:http://www.the-site.com/the_page.php - did not match any documents.

But a search for backlinks to that specific page still returns all the indexed backlinks.

The page has been dropped entirely, and replaced with a crummy forum thread discussing my page. The crummy forum is on my server (along with 100 or so other sites I host, the vast majority of which I do not own. they are hosting clients.)

The only reason I can think of for this page being dropped is the "re-ranking component 124 removes documents from B(y) that have the same host as document x."

The forum thread does not quote any part of the original page, it just discusses it. So I can't imagine a duplicate content penalty here.

Frankly, I'm rather baffled by it. Google losing a page with a ton of inbound links? It's odd.

I'm wondering if anybody might have some insight into this issue.

ThatAdamGuy

6:23 am on Sep 3, 2003 (gmt 0)

10+ Year Member



I can understand your pain, but cannot imagine that this 'penalty' you're seeing is based upon any de-dup'ing algorithm, especially because -- as you point out -- there aren't even any quotes or chunks of text in common!

Prominent sites and companies are discussed all the time on forums, often on forums OWNED AND MANAGED by those very companies.

Many web hosting companies, for instance, host discussion boards to offer peer-to-peer support. I can't imagine "MegaHostingCorp" losing its front page PR7 to a page deep within one of its forums that (like every other page there) happens to discuss MegaHostingCorp.

Similarly, could you imagine Google penalizing Microsoft (okay, humor me here :D) because its KB support documents also mention various Microsoft products? I mean, incompatibility notices about MS Flight Simulator, for instance, would then rank higher than MS' main Flight Simulator page. That just doesn't seem logical.

I CAN imagine Google aggressively penalizing highly duplicative content, especially when posted within an IP range. Or even de-emphasizing PR benefits from intra-linking within an IP range. But penalizing mentions? It really seems too outlandish.

MonkeeSage

6:35 am on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great google-y-moogly...if I am reading that patent rightly, they aren't even checking for dups if they find that the "xxx" in yyy.yyy.yyy.xxx, matches another "xxx" of another document already indexed!

Documents from the same host as document x tend to be similar to document x but often do not provide significant new information to the user. [...] More specifically, let IP3(x) denote the first three octets of the IP (Internet Protocol) address of document x (i.e., the IP subnet). If IP3(x)=IP3(y), document y is removed from B(y).

Are you sure that patent is for real? Seems like, as ThatAdamGuy said, they would be dropping millions of sites if it were!

Man I sure hope it isn't real or that my puny brain has failed to grasp it or something!

Jordan

mosley700

6:59 am on Sep 3, 2003 (gmt 0)

10+ Year Member



>>Many web hosting companies, for instance, host discussion boards to offer peer-to-peer support. I can't imagine "MegaHostingCorp" losing its front page PR7 to a page deep within one of its forums that (like every other page there) happens to discuss MegaHostingCorp.

No lost PR, just not in the SERP's. The PR is intact, even as Google dropped the page from their index.

New development: I have two hosting clients. Both deal with the same product. If I do a search for Domain1, it is not there! The index page of one of my hosting clients is gone! In it's place is index page of hosting client #2, who offers the same name-brand items. No duplicate content, just same IP.

The index page of hosting client #1 is still showing PR, and still shows backlinks, but there is no cache of the page anymore, and a search for his domain returns his competitor's site.

LOL. Good to hear Google has a sense of humour.

BlueSky

7:00 am on Sep 3, 2003 (gmt 0)

10+ Year Member



It sounds like your situation doesn't fall under that particular scenario. It says the documents (ie the crummy forum) doing the linking are the ones to be dropped thus leaving document x (i.e. your page). Unless they programmed it differently, yours should have stayed in according to that.

mosley700

7:07 am on Sep 3, 2003 (gmt 0)

10+ Year Member



I host several web designer sites. Last two days I've been getting a bunch of traffic for a regional web design search term I don't target. A web designer's site I host was number #9 for that particular term - now I'm #9 and that web designer's site is nowhere to be found in those SERPs.

It's choosing just one document from the IP range.

ThatAdamGuy

7:18 am on Sep 3, 2003 (gmt 0)

10+ Year Member



And for my deep thought for the night:

ACK! :-(

I do hope this is just a coincidence or a temporary GF (Google Fart).

mosley700

7:24 am on Sep 3, 2003 (gmt 0)

10+ Year Member



I'm sure Google is on top of it. If not, oh well - I lose a few customers to another IP range.

doc_z

9:25 am on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The original page still shows PR and backlinks, but is no longer indexed. A search for the URL itself shows:

>>Sorry, no information is available for the URL www.the-site.com/the_page.php

Attempt to get the Google cache returns:

>>Your search - cache:http://www.the-site.com/the_page.php - did not match any documents.

I think this cleary shows that the problem is not caused by any new "re-ranking component that removes documents". If that would be the reason, the problem would just arise for the keyword phrase. Perhaps there is a simpler explanation, e.g. the page was down when Google tried to spider it.

I have a site with a number of similar pages. Google has indexed 46 different pages. However, a search for 'site:www.domain.com keyword' yields 'Results 1 - 33 of about 46' even with 'filter=0'. A search for 'site:www.domain.de keyword pageXY.htm' brings up each page (even those which were not shown for the first search).

By the way, recently there was a discussion about a new Google patent [webmasterworld.com].