|Is Google's Algo Out of Control?|
What do the programmers in the audience think?
Every time Google makes a change, opinions seem to fall into two camps: "The system is broken" or "It's a deliberate plot to [extort more money, promote greater relevance, achieve world domination, whatever].
I could be persuaded of either (or both). On the one hand (arguing the latter), Google is a huge company with deep pockets; they can afford the best and brightest programming talent, who surely must know what they're doing. On the other hand, I could easily believe the system is so complex that it's not possible to make a change without unforeseen consequences.
I'm not a programmer, and I'd be interested in the opinions of current or former professional programmers (only) as to which option they favor, and why.
|two camps: "The system is broken" or "It's a deliberate plot to [extort more money, promote greater relevance, achieve world domination, whatever]. |
You forgot the third camp: The changes by Google are carefully thought out and have a bad effect, for the most part, on spammers and black hats. Yes, there are occasional damages to innocent bystanders, but that is unavoidable given the huge size of the index and the complexity of the algorithms.
This falls into the second category--i.e., deliberate vs. inadvertent. What I'm interested in is which alternative programmers think is more likely, based on their experiences.
My opinion on motive is neutral: My problem with it is the rollout of non-production quality systems which have potential multi-million dollar effects when they fail.
Saying to a customer "Let's change how you do business with us at 3:00 PM on a Tuesday afternoon. Oh, and we're not really going to tell you much about it beforehand." really sucks.
Making your system practically inaccessible for 48 hours before this change and having ineffective customer service reps is even worse.
The worst part, however, is that there is no alternative to most of Google's technical search traffic. A good majority of Google's users are zealots who will only use Google, and unfortunately, Google's the only way to address those zealots. The competition from Microsoft and Yahoo and AJ is indeed good, but most technical users aren't switching yet. I hope that Yahoo's latest efforts w/r/t the community, RSS, and bloggers in general will help convert some of those zealots.
Reminds me a bit of how Microsoft used to act before Linux came onto the scene seriously. MS has changed their tune and really started to listen to their customers and developers; will Google do the same?
I spend nearly 80 nights per year at the Hilton, not including rooms I book for employees, family, etc. When I have a problem at the Hilton, they give me a number to call that puts me in touch with a team that actually has the power to solve any problem I could possibly have.
Those same people take responsibility for any mistake that anyone in the company, no matter where in the world, may make. Often I'll forget that I had a problem, and will be reminded by the fine folks there that constantly call me until they are happy that I am satisfied.
But, the Hilton is forced to provide this level of service. They have several large competitors I could run to in a heartbeat.
We will never see this level of service from Google until they have a serious competitor to technical users and traffic. They will not empower their customer service reps to do anything but give us the same bull#*$! runaround that they are famous for.
One final example: I tried to use the site targeting system yesterday to target a large site I know is running premium Google advertising. I phoned in to inquire why I couldn't target that specific site, as my ad is already appearing on that site in a keyword-targetting campaign.
A simple question, really. Here's what I got back:
|I have consulted with a Product Specialist and as we suspected, there is no way for you to target <sitename> if the system does not confirm this website is an option. |
That's the most unresponsive answer I've ever seen. Run through the BS translator, I get:
Yes, we know it doesn't work. I just verified it doesn't work.
Thanks for the service, guys.
A simple case of the arrogance that billions in the bank can cause. Google is probably due for a comeuppance, a humiliation, but it may be years before it arrives--they have another multi-billion dollar stock offering soon.
On the other hand G's stranglehold on its market is less entrenched than Microsoft's. No matter how loyal or fanatical your customer base is, a hint of true betrayal or weakness can sour it quickly. I can't think of what that may be, but we've seen the mighty tumble before... Changing your search engine is a whole lot easier than changing your operating system, business apps or hardware.
This is why G is constantly trying to widen its stance with new products.
But their achilles heel is their reliance on but one Adsense/AdWords cash cow. If that cow dies they will be in about the same position the left me with when they banned my site on July 28.
Thanks, but what I'm trying to ascertain, from those with the experience to understand the various elements involved, is not so much a question of motive. What I'm curious about is simply the technical question of whether what's happening is all according to plan or whether the system is so complex it's gotten away from Google and there are consequences to any change that even their engineers don't foresee.
|...what's happening is all according to plan or whether the system is so complex it's gotten away from Google and there are consequences to any change that even their engineers don't foresee. |
Undoubtedly some of both, and the balance shifts daily. They have a lot of room for error with their customer loyalty and healthy bank balance but the situation you describe in the passage I've quoted could apply to almost any company or dynamic system.
You've asked the right question, I doubt any human alive could answer it with great certainty.
To answer your question directly: Google is deliberately trying to eliminate spam from their listings. They must do this or they will die. It's a very simple equation. If the spammy listings continue, Google's search engine will become irrelevent.
Google recognizes that your average web surfer doesn't give a hoot if a site or two is missing from the index, as long as quality sites are at the top. But having a couple of quality sites buried in spam junk will cause a surfer to look elsewhere for information. So an innocent website caught in the crosshairs once in a while is simply not important. In fact, much the same as when a missile is taking out an Al Quada hellhole in the middle of Afghanistan, a few civilian casualities are expected. Not desired, but expected.
From a systems point of view, and I do this for a living, I believe their changes are extremely well considered, targeted very specifically and tested well, albeit on a relatively small database. They are fighting what is probably a loosing battle against the spammers, a battle that many have fought in the past and all have lost miserably. The spammers win in the end, because you can get around any robotic algorithm.
Google knows it has a quality problem: too much spam. Way too much spam. In fact, some keywords and search phrases are simply unusable now in Google. They are searching for answers and trying to come up with algorithmic ways to eliminate the problem. Their algorithms are carefully thought out. They are very well tested. They work very well (sometimes too well). But they won't solve the basic problem.
My opinion is that Google needs to stop pussyfooting around. In other words, they are not even coming close to being harsh enough. The Google index is becoming filled with spam to the point where, as a web surfer, for many keywords it is absolutely unusable.
They are being cautious, testing things to death to make sure the collateral damage is at a minimum, in my opinion. I believe they have a conflicting concept: to aid the searcher or to aid the webmaster? They need to come to grips with their audience, which is really web surfers, and satisfy them with the results that are needed.
And when I put in a drug name and get back 10,000,000 spammy doorways, affiliates, missing pages and other misc junk, then I'll go to some other search engine or directory. I didn't want to buy the drug (adsense listings were great for that), I wanted information about it.
They have to solve this problem, or they will go the way of Altavista and a zillion other companies drowned in spam.
I agree with richlowe and acknowledge that my bitterness over my personal losses in the 7-28 massacre shade my opinions too much.
That said I think Google is too reliant on automation and should put more human editor brains on the job to improve search and another layer of staff to improve relations with people like myself who really defended Google for years but now root for Yahoo et al.
They certainly can afford better public relations with webmasters. Perhaps they underestimate the damage handful of motivated and articulate detractors can do. Perhaps I overestimate it, but then I'm still angry.
richlowe, looks like you're referring primarily to the SERPs. What I had in mind was the Adwords algo that so many are anguishing about in other threads (although I suppose it applies to the SERPs too).
|I believe their changes are extremely well considered, targeted very specifically and tested well, albeit on a relatively small database. |
So you think the seemingly inexplicable occurrences people are reporting for their Adwords accounts are an intentional part of the new algo, not bugs that G is working out during live testing? (e.g., the apparently random relation between relevant ads/keywords and the "Quality Score" or minimum bid)?
|then I'll go to some other search engine or directory |
Where would you go that's any better?
|you're referring primarily to the SERPs |
You are correct, plus adsense. I have not been keeping up on adwords issues since I don't use it.
|Where would you go that's any better? |
I've been finding that MSN and Yahoo seem fresher and somewhat less spammy on some phrases, but I haven't been using them that long.
|all according to plan or whether the system is so complex it's gotten away |
As a former professional programmer, and a current basement programmer, I can say this...
I've worked with some complex systems where a breakdown with 'A' will either ripple thru the entire system, or a breakdown with 'B' could shut the whole thing down, or a problem with 'C' might go unnoticed for a long time.
I've heard it expressed several times over the last fews days in these fora that it looks and feels like G is experimenting with a live system! IMO (and that of many others) that is a huge blunder. Anything can happen! Moreover, it shows a certain level of arrogance with the development staff. You can't play with customer data, and your arrogance will bite you in the arse one day. Sure, it a real pain (sometimes) to develop, maintain and test with a controlled set of data. But you limit your liabilities and some occassional pie-in-the-face. I once spent about 6 weeks overhauling one program. Long story short, it had grown until it was no longer maintainable. All of my testing was done with sample data. On the big day, the new program crashed right out of the gate. There was something I could not test for in the data set. Humiliating, but not life threatening. The point is that you don't roll out something new without extensive testing. Even then, there could be a few problems. But most of the problems have been mitigated.
Frankly, I'm not all that pleased with some of the changes I've seen with AdWords. There's a whole thread dedicated to that so I won't expand on that now. It is beginning to look more and more like a beauracratic system where managers are making the calls, egos are getting bruised, programmers (and other staff) are getting buried, and customers are getting the short shrift. It was bound to happen.
Anybody know of any open source search engine project(s)?
Anybody wanna start one...?
What I believe I see happening is the de facto privatization of the Net. If all access is effectively owned and controlled, strangled, squeezed and extorted by Google, Yahoo!, MSN or all of the above, it really doesn't matter which master you're beholden to; you're still *-ed (insert epletive here, starts with F). The only thing which could halt or challenge this trend is some kind of creative commons/copyLeft Wiki SE.
Open source search engine?
Well, there's nutch at lucene.apache.org/nutch/
But that Search the site with google box doesn't inspire confidence.
I agree with richlowe, if you try Google engineerings shoes on I'm, sure you will see that they have a challenging job. There are so many flavors of undesirable spam, a single solution isn't possible. I suspect they run many filters on their data all the time, sometimes with unanticipated results. All of us have seen them backout of a algo update to a previous algo.
Overall, I give them a good marks. I don't believe it is possible to make everybody happy all the time.
Yes, Google in reality cannot do all testing "offline"; the Internet will not hold on for a day or two while Google "stages" their latest release.
I'm sure G does do huge amounts of testing before bringing new things on line, but there are so many sites, so many possible "emergent" possibilities from all the interacting components, and so many ethically warped webmasters screaming that G owes them some sort of living and damn the users, that collateral damage is bound to happen.
Remember: if the scammers and SPAMmers weren't there trying to defraud us (end users, SEs, Web masters) one way or another then there would probably be fewer upsets. Case in point: I had to put up my own filter IN FRONT OF my mailserver to turn away all but about 10 of the up to 40,000 scurrulous and dishonest attempts to SPAM me personally each day, and I'm now having to apply some of the same of my Web sites. What a waste of time and effort, and all the increased complexity brings bugs and errors.
Blame the scammers, not G.
All IMHO of course.
As a programmer myself, I have the utmost respect for the Google engineers. If they are having issues with the algo, it's only due to the enormously complex nature of the system. I have to admit that I have been a bit stressed about some of the changes this week, but they really have only effected our bottom line a tiny bit and I believe this will correct itself over the next few weeks.
Regarding the two theories mentioned above, I'd say that if the system is broken, it's still better than the alternative. The level of functionality, responsiveness, service, and quality provided by Google, while certainly not perfect, is leaps and bounds beyond the complete trainwreck that is their closest competitor (no names mentioned here). I am not buying that this is just a big money grab either. If Google want to squeeze us for a bit more cash, I can't say I blame them and it's up to us, the advertisers, to outsmart the competition and make the new system work. But I don't think it's in G's long-term interest or strategy to ignore quality so they can shake down their customers for more money.
|I've worked with some complex systems where a breakdown with 'A' will either ripple thru the entire system, or a breakdown with 'B' could shut the whole thing down, or a problem with 'C' might go unnoticed for a long time. |
... I once spent about 6 weeks overhauling one program. Long story short, it had grown until it was no longer maintainable. All of my testing was done with sample data. On the big day, the new program crashed right out of the gate. There was something I could not test for in the data set.
Thanks, grandpa. I had a feeling a large-scale disaster was certainly a possibility, even for Google.
Actually, G can easily test new search engine and AdWords code against the full production database and index. When new SERP algo or even AdWords code is being tested, it is primarily the logic that is being tested, not the data. With the exception of advertiser bid data and other account specific stats, which are in much smaller and more easily replicated and managable databases separate from the indexes, the same production index database can be referenced simultaneously by a site operated development programmers, a separate site used by a development QA team, and a pre-implement staging site for final QA team, even as it is being hit my millions of production users. What is much more difficult to test is how new logic might affect performance under a full production hit load.
|Anybody know of any open source search engine project(s)? |
Yes, it's called ht://Dig. How much have you contributed to this project lately?
|I'm not a programmer, and I'd be interested in the opinions of current or former professional programmers |
I think the problems with the Adwords algo's are not so much technical problems. Whenever Google changes the system the market responds in unpredictable ways. Advertisers change their campaigns. Publishers in the content network develop new sites. It's like the real economy, it's a matter of opinion and debate rather than science.
|I think the problems with the Adwords algo's are not so much technical problems. Whenever Google changes the system the market responds in unpredictable ways. Advertisers change their campaigns. Publishers in the content network develop new sites. It's like the real economy, it's a matter of opinion and debate rather than science. |
I don't think the response is necessarily unpredictable, but the unpredictability is certainly exacerbated when the algo doesn't do what they said it was going to do. In that sense, I think it is a technical problem.
|I don't think the response is necessarily unpredictable, but the unpredictability is certainly exacerbated when the algo doesn't do what they said it was going to do. In that sense, I think it is a technical problem. |
The algorithm does what they told us. It calculates a minimum bid value based on quality factors. They only didn't tell us which quality factors are taken into account. If these are floating parameters, like current CTR or current bid value of others for the same keywords, this can cause ripples in the advertisers ocean which need time to dampen out.
FYI, As a result of the recent algo change I have changed the bid values for almost 50% of all my keywords, some increased, others decreased. I deleted some unprofitable keywords from the list and added others. I wouldn't be surprised if many other AdWords buyers did the same. This totally changes the advertising field. Some dropping completely out of the race, others seeing opportunities to make money in a way never possible before.
As a programmer I don't think the algo itself is out of control because the calculation of a quality score for a keywords is based on some mathematical calculations. Even if many factors are taken into account in this calculation the result would be different than in the past, but still stable. But the process behind it of advertisers shuffeling around with their keywords and values is certainly not stable and unpredictable and needs some more time to stabilize.
From seeing my own campaigns, it appears that they have a list of words that they are now commanding more money for. It does not make any difference if the word is being used in a phrase that puts that word in a different context than its more valuable meaning.
I don't believe it's all that sophisticated, this quality score. They're simply cross-checking against a list of words.
For instance, I have a site devoted to a medical condition that has never attracted big money advertisers. All I do is sell 10 books on the subject. The ad reads "Most under $20.00". Yet, this condition must be on a list of medical terms that require higher bids now.
For months, I had an average CPC of .06 and a position of 2-5. Now, virtually every word in the AdGroup wants $1.00. The site was barely profitable as it was. Since it's a condition in the family, my wife and I joked it was our public service campaign.
Since I have many hundreds of AdGroups, the loss of this one doesn't really affect me, but now there are no ads showing for searchers interested in this condition.
As a programmer for about 20 years in a variety of traditional brick and mortar shops, I must say that the attitude in "brick and mortar" business vs. e-commerce differs considerably. Never would a brick and mortar shop put up with the vagaries and downtime that we experience on e-commerce sites. There is a far, far more casual attitude on the internet.
If I told the boss, I was doing some beta-testing on the live system this afternoon and they may see some slowdown on the lines and some mis-labeled products while I caught the bugs, I would have been tossed out on the sidewalk.
Software had to be as absolutely bug-free as possible before being released. Otherwise, there was heck to pay, even over small problems.
If I initially let 600 products get through where a bar code was a 1/4 inch too high and had to be manually scanned rather than get picked up by an inline scanner, I would hear about it for weeks. It would take forever to earn back their trust.
Not so on the internet, where I see even big e-tailers down for hours or without images displaying for long periods, wrong prices, no prices, price discrepancies between one page and the next, exposed bad code, bad links, etc.
Most times, when major bank and retail sites upgrade, you'll endure days of "You must have reached this page in error".
Either the profit margin on the internet is so great they can absorb these common mishaps or there is a general lack of experience among internet programmers where they've yet to learn from past mistakes (which can only come with experience).
In any case, to their credit, Google and others have gotten me out of the harsh brick and mortar world and allowed this dream of being my own boss on my own terms come true. And it's all something I fell into by accident and wouldn't have dreamed of 3 years ago.
So I must thank Google and certain others for that and it's nice to have a boss that I can yell at now :)
|As a programmer I don't think the algo itself is out of control because the calculation of a quality score for a keywords is based on some mathematical calculations. |
Seems like a contradiction in terms--I don't think a "quality" score can be based on a "quantity" (mathematical) calculation--which is one of the problems I think we're seeing, as p2a and others point out.
|I don't think a "quality" score can be based on a "quantity" (mathematical) calculation |
You are mixing the value, and meaning of things. The value is inside the mathematical formula, the meaning is in the outside world. See it as PageRank. It is a pure mathematical formula used to calculate the quality of a page. PR is calculated in a pure mathematical way: counting links and their value from a previous iteration. Yet the result of it is (or was, before massive SPAM techniques) SERPs with pages ordered on quality.
What I mentioned in my last paragraph is, that even though the algorithm itself is stable, the whole process including the advertisers is not.
I don't believe, until we have true artificial intelligence, that there is any program that can look at a free form ad and determine the following:
"Well, these keyphrases are trying to sell an e-book about how to buy wisely at estate sales. Now while an 'estate' itself can cost millions of dollars and advertisting for selling estates can command high $ for advertising since they'll be so many competitors, an e-book like in this ad can only sell for a few dollars. Therefore, although the word estate is used widely in both types of ads and their keywords, they deserve very different dollar values for the keywords".
So no program can properly differentiate (though this is an admittedly poor example) the difference in value between the keyword phrase "buy an estate" and "how to buy wisely at an estate sale".
This was not a real life example for me, but I'm seeing the same thing happen with a single word that can represent merchandise selling from $5.00 to $15,000.00.
The only way for this part to work itself out is to let the marketplace decide how high to bid - like we did before. And leave the CTR factor in there for fairness. I shouldn't have words with 15% CTRs "Inactive" when they're virtually synonyms for other words that are running just fine still.
I agree with p2a.
|You are mixing the value, and meaning of things. The value is inside the mathematical formula, the meaning is in the outside world. See it as PageRank. It is a pure mathematical formula used to calculate the quality of a page. |
What they're doing is attempting to quantify a quality, which by definition can't be done. They may be calling this number a quality score, but that doesn't mean it is one. (In another thread, several people have offered suggestions for more accurate names.)
You know the story about Abe Lincoln? He once asked his cabinet this question: "If you call a dog's tail a leg, how many legs has it?"
"Five," they answered.
"Four," he replied. "Calling a tail a leg doesn't make it a leg."