Page is a not externally linkable
reli - 3:34 pm on Jul 31, 2005 (gmt 0)
Are we talking about sites that have thousands of html/asp/etc. pages, or sites that have thousands of virtual pages fed in real time using scripts, i.e. www.sitename.com/cgi/odp/?cat=business+finance? Or a blend using url-rewrites to eliminate the "?" from dynamic pages? [Edited for clarity of initial sentence, and ODP usage] Google is likely getting caught more in the situation described in an old book (either "The Mythical Man Month" or "The soul of a new machine"; I can't remember). At some point, the book described that an operating system was riddled with potential side effects, and that the team's management had to decide on which side effects to live with. Why? Because if you change one bug of the OS, you get some new side effects that, if fixed, created a different set of problems, but worse than the one being fixed. Same with any SE algorithm, or for a project to change the algo. With more time and engineering, you get a chance to stop more side effects. But if you've run out of time for engineering.... Depends on the risk you want to take ... turn the "Quality" knob or turn the "Time" knob, but they are linked and turning one will move the other in the opposite direction. But it's not determined how they interplay until after you turn them. The other knob is the "Scope" (or "Complexity") knob for a project. The running logic is that you can turn 1 or 2 in a favorable direction, but not all three. So, in the case of eliminating Scrapers, if you turn the "Scope" knob too much you impact some other area negatively ("Quality" goes down), if you turn too little, then you have gotten too little quality as well. Seems that people reporting that they turned the knob "too much" and "too little" in eliminating scrapers. A delicate balancing game. As far as telling you, me, or the man in the moon what they are doing, banning, or thinking... Google is a publicity-shy company. Any publicity-shy company will communicate less than some people would want, and in the case of their algorithm, it is also a trade secret. And as you may be aware, Google and MSN are battling just on the issue of whether an ex-MSN employee can work at Google. Back to the issue at hand - I've had sites drop out of the index and back in.. seems way more frequent in 2005. And searching for just the name of my domain, "Yowzawidges", I sometimes see that I'm closer to #100 than to the rightful (!) #1 slot that I held the month before. It's just that people - my users - use SE's as a shortcut to find the site again, or to refer people to me, and when it instead turns up a high-PR site that simply has a sentence using that invented-by-me Yowzawidges word, it does no one any good.
[Contractor - You would mean block it from being crawled by bots from SE's that would penalize you for it, yes?]
Some report that directories are not an issue on Yahoo or elsewhere, where having a directory is not a sin. (Not that we know if it is a sin on Google, and I assume it will not be).
My odp-fed sites via a "?" script all still listed. I added the script a long time ago to try it out, and found that people used it, so I've left it.