Forum Moderators: open
What I'm thinking is if this is the case will reciprocal links be affected or incoming links from a page named links.htm be affected.
One reason that makes me think maybe this isn't an issue is how easy it would be to rename a page so why bother penalising?!
Findings like this indicate to me that Google's algorithms are very unsophisticated
hmmm.... most people at WW, regrettably, assert as truth a lot of things about G's algo that do not stand up to close scrutiny. There was a debate about whether or not anchor text mattered, and even after the whole "miserable failure" thing surfaced, some still insisted that anchor text does not matter.
We have very little idea how simple or complex their algo is. And judging from what little we know, it's probably conceptually simple. But that is precisely what good computer scientists look for - simple yet powerful. The only reason it works is because it is so elegant. A program with a great many variables might seem sophisticated, but it is the anti-thesis of what a good programmer seeks. And I have no doubt the folks at Google are top-notch programmers.
G can quite probably analyze a page to determine the extent to which different factors (such as text density, repetition of words, percentage of text that is not in links or as alt tag for images) influences the amount of time people spend on a page (which they can get from their toolbar).
We are inherently at a disadvantage when trying to reverse-engineer their algo because we do not have the same sample- ours are tiny in comparison. Our PR is on a log scale, so we could be 5 times off on our measure (it could also be late). As long as we are using one less variable than they are, our results will be off. Finally, they can change the weights they assign to each variable without telling us.
In short, reverse engineering Google's algo is impossible. You might as well try to square the circle.
But if we want to make rough, educated guesses, we have to control our experiments. Right now there isn't enough data to make even so much as a tentative conclusion about it, let alone determine that it is not sophisticated.
</rant>
While there are lots of pages named links.* that seem to be blocked, there are also many examples where they are not. That proves that they are not looking *only* at the name of the file.
While it is almost impossible to prove something, it is real easy to disprove, and google blocking links pages based soley on the name has been disproved.
They might be using the name as an indicator for further testing, but they are not blocking only on the name.
Our links.htm, that does not appear as a backlink, has vitually no text other than the anchor text and a title, (kw kw links), at the top of the page. There are 12 external and two internal links, all regular text links <a href="xx.com">xx</a>.
Perhaps /links.htm combined with no real content would do it. Just speculating...
Many webmasters (link farmers, users of known recip linking tools) already changed the filenames of their farms. Many of them don't even use real words anymore but just alphanumeric strings like grbzaxy.htm, 123.htm or index1234.htm ...
Profi recip guys and gals organize their farms by topic and even spend some seconds to post a sentence or two to describe what the partner site is about and to avoid having too much anchor text on their pages.
Profi link masters even don't have single links pages anymore. They have mini sites - with just a handfull of links per site.
This leads me to believe that Google is working on (or already uses) far more sophisticated algos for detecting link games - not just looking for filenames.
- themes
- links within real text content
- text / anchor text ratio
- non-reciprocal links
- number of outbound links per page
- maps
...
[edited by: Yidaki at 6:39 pm (utc) on Jan. 13, 2004]
read [webmasterworld.com...] msg#46
My link page is divided up in to 5 sections in a table. With a name for each section in either h1 or h2 or something. Names like…
Other Quality Links
News Groups/Forums
.org, .gov, etc. Links
etc.
- text / anchor text ratio
Yep, that would be my links page.
Jim, I don't think I'm huting the sites that are linked to from that page, (not that I can see), but it doesn't seem to be helping them. The important ones have links to them elsewhere in the site that show in their backlinks, so I'm passing PR from other pages at any rate. I might do something about the links page though... rename it anyway. If I start adding text, it will mess up the design, such as it is. The thing does get hit by visitors to the site and presumably used... they're all good pertinent links.
caveman- that's not a controlled experiment.
True enough. We made a few hypotheses based on analysis, combined a few theories ("links pages are in trouble"; "more content on pages is good, especially post Florida"), and then made some low risk changes. They worked. Or like I said, maybe it was a coincidence.
I've been in businessess for 2-, ah, nevermind, a long time. ;-) Very little in business is precise, and if you wait for precision, you'll be very accurate, and very poor. Educated guesses, balanced by risk assessment, are what makes the world go 'round.
I wouldn't waste much time trying to reverse engineer G's algo...but when it looks like a duck, walks like a duck and sounds like a duck, we usually treat it as a duck. Works for us.
But if we want to make rough, educated guesses, we have to control our experiments. Right now there isn't enough data to make even so much as a tentative conclusion about it, let alone determine that it is not sophisticated.
This one is not rocket science. Brett looked into this years ago in the context of poison words as I recall (see the thread I reference in msg#3 of this thread) and seemed to conclude from searches he did that G was dampening PR of these pages. Since then, it appears that G's handling of these pages has gotten harsher still. Read steveb's posts here; looking around seems to confirm a problem with "links.ext" pages, whatever that problem is (and even though like almost always, there are exceptions)...
Also, it became pretty apparent to us that links pages got hit worse again sometime before or during Florida.
So in this case, why in the world would anyone ever name a links page links.ext? The work around is simply too easy and too low risk. Of course, like I said before, if you are having no problems with your links.ext page, don't fix it...just keep an eye on it with this thread in mind....
On the other hand there IS another link, from a page titled "cool_links.html," which does NOT show up in my backlinks, though it does show up in the backlinks of other URL's it links to earlier on the page. Perhaps there's some kind of glitch floating around causing Google to randomly stop following links partway down a page? That would certainly be affecting links pages disproportionately.
The "links.html" title filter has gotta be a superstition, though. There are seven counterexamples staring me right in the face.
That links further down the page are not showing up is a very interesting observation.
A fairly simple (in concept) change to the pagerank algo would weigh the importance of the link based on the likelihood that a person clicked on it- dealing with the unrealistic assumption in the original algo that a person would randomly click on any link on the page.
And we know that a very good predictor of a link getting a click on SERPs is its position. Not sure what exact formula Google would use, but it seems likely to me that they could apply this to PR calculations.
Perhaps you could ask the author of "cool_links.html" to bump up your link further up the page? :)
There is nothing wrong with Google if you would name your link page as links.htm. If your intention is to get reciprocal link partners, then that is where there might be a problem, not with Google of course, but with those webmasters who believe that "links.htm" is a problem. It just make your task more difficult.
It's one of those things; the situation in some peoples mind is a grey area; so given that it is grey, and not black and white; don't use link.htm. You have nothing to lose!
/1000th post.