| 5:20 pm on Feb 11, 2007 (gmt 0)|
There's a lot to be said about the scraper theory. I manage a fairly significant number of sites for my company and our partners (mostly in hypercompetitive industries); as such I have the misfortune of encountering a lot of the funky penalties that crop up.
I've been lurking on the 950 penalty threads because I wasn't entirely sure what was going on, given that multiple factors do seem to be in play, but I have found some commonality with my own experiences.
1. Phrase-based penalties & URL-based penalties; I'm seeing both.
2. On phrase-based penalties, I can look at the allinanchor: for the that KW phrase, find several *.blogspot.com sites, run a copyscape on the site with the phrase-based penalty, and will see these same *.blogspot.com sites listed...scraping my and some of my competitors' content.
3. On URL-based penalties allinanchor: is useless because it seems to practically dump the entire site down to the dregs of the SERPs. Copyscape will still show a large amount of *.blogspot.com scraping though.
Getting rid of scrapers is a thousand page thread in and of itself, but what I've been doing so far is a mixture of modifying titles, slightly modifying on-page text, getting some new links that match the new title, and where possible, turning in the *.blogspot.com junk as spam on both the blogger and G spam report side.
Normally scrapers wouldn't be a huge problem, but with Google continually tweaking their authority knob, those *.blogspot.com are becoming instant authorities, which is really, really bad. That has to stop as of last year. I don't have an answer as to why sometimes the penalty is phrase-based and why it is sometimes URL based, but I can say that I've seen them alternate on the same domain, I've seen just the phrase-based issue occur and resolve itself, and I've seen the URL-based issue occur and resolve itself.
Confusing isn't it?
So that's my vote...false authority scrapers that are causing temporary filtering as Google attempts to determine which is the more valid source, rectified by modification of both on-page and off-page tactics.
| 5:44 pm on Feb 11, 2007 (gmt 0)|
|Thinking that PR is the be-all and end-all is very 2002 thinking - this is 2007. |
PR is still an important anti-spam tool. If a page that has been around for a while has no links at all, it's most probably spam.
|This seems to be unrelated to PR. In fact hundreds of pages with less and even 0 PR are ahead of my missing pages. |
I wouldn't expect Google to make these changes visible in the toolbar, they are most likely only temporary.
If the toolbar shows PR0, then a new page may, and usually does have some PR.
Whether or not PR is the tool that Google uses to send sites to the background isn't even important. What matters is the question why better-content, non-spamming sites are temporarily deranked.
| 5:50 pm on Feb 11, 2007 (gmt 0)|
|Now, how could a pic, added on the 10th of Feb be part of the cache of page done on the 7th of Feb. |
Because it's linked to your server?
| 5:58 pm on Feb 11, 2007 (gmt 0)|
>it's a live download of your page and popped into the browser
No.... look again, especially at cache dates.
>Whether or not PR is the tool that Google uses to send sites to the background isn't even important. What matters is the question why better-content, non-spamming sites are temporarily deranked.
Welcome to 2007. This is way more than a PR issue.
[edited by: MHes at 6:03 pm (utc) on Feb. 11, 2007]
| 6:11 pm on Feb 11, 2007 (gmt 0)|
One of my pages is in the number 1 position with Site Links below it. Two of the pages (urls) occupying Site Link position 2 and 3 are ALSO repeated in the 900's. Same urls. Wouldn't that indicate that the 950 Syndrome is a separate process from the normal algo?
How else could Google rank the same url in positions 2 and 943 at the same time?
| 8:26 pm on Feb 11, 2007 (gmt 0)|
I think that may happen because the extra Site Links are determined by a separate process. Then they are "attached" to the domain root in the number one position as extended information for the user. In other words, those urls do not actually rank on the first page of the SERP according to the algo. They sort of get carried there on the coat tails of the domain root.
| 9:46 pm on Feb 11, 2007 (gmt 0)|
|Basically, the Google cache is not a cache guys, it's a live download of your page and popped into the browser. |
No, it's not. The images are live downloaded from your own server, but the HTML code is cached.
[edited by: tedster at 8:05 am (utc) on Feb. 12, 2007]
| 11:02 pm on Feb 11, 2007 (gmt 0)|
The penalty hits authority sites because it is all about the scoring of the page. Again, you can't understand the penalty unless you look at the group of sites there, not just "my site". Besides powerful niche authority sites capable of scoring highly with many pages, there normally are at least a couple dozen of those hacked/redirect puke "pages" listed. These are *extremely* high scoring pages, with tons of randomized anchor text, links from unique domains (meaning blog comment pages), randomized keyword text, and so on.
It may happen occasionally, but I've never seen a page that ranks #50 (for the poison word search) hit with this penalty (unless it was in a directory beneath a penalized page). High scoring pages are at risk, this means authority sites mistakenly get hit along with the spam sites beign targeted.
| 5:27 am on Feb 12, 2007 (gmt 0)|
I got two article pages back today! I took the key phrase for each formerly missing page out of the page title, H1 tags, and any internal links. I also decreased the use of the phrase in the article text.
My missing subdirectory and a few other pages are still gone. Still working on them.
| 8:00 am on Feb 12, 2007 (gmt 0)|
annej and Steveb - I agree with your observations. We may have found (probably stumbled upon) a fix but it is way to early to know if it will stick. What gives me hope is that over the last six weeks we have made changes that have had zero effect and we just continued to pop in and out on a four day cycle with old rankings coming back as if we had changed nothing. Having undone all of that we now have a means to at least make a difference. We are showing different pages for searches but ranking position 6 or 7 when before we were ranking 1,2 or 3. These include hundreds of search phrases from very competitive to longtail.
I think getting a new cache of your changes is obviously important, but also there is an offline analysis which needs to be updated as well. The big problem is knowing what data is being used offline and are they now using old data with previous 'fixes' that you have since abandoned!
annej - Are you ranking as high as before.... despite your changes. Also, are you getting the same indented page combinations?
| 8:31 am on Feb 12, 2007 (gmt 0)|
From my pages that came back, no, they weren't ranking as high as they were before, more like 10 places lower or so. Previously top5, now around top15-20. Still missing one key page from what I can see, this one has been gone for many months longer than the others (probably 6-8 months or so).
| 9:21 am on Feb 12, 2007 (gmt 0)|
how do you navigate your users to those pages without linking with the correct anchor?
| 11:56 am on Feb 12, 2007 (gmt 0)|
Had one directory's worth come back today, but most unaffected. No changes made to the one's that cam back, but they did get fresh tags. Previously they died one time before after the fresh tag expired.
| 12:26 pm on Feb 12, 2007 (gmt 0)|
After reading and trying to understand many of the posts on this and other threads, it appears that google seem to have the upper hand not because they are smarter but because they have added phrase matching to an existing array of penalties and possibly at the same time changed the way those penalties are implemented
I wonder if they are now using something similar to the court system where sentances / algo pennalties are aggregated
we all know some of the ways they use to apply boosting to pages
Page Rank / Trust Rank i.e inbounds
Standard SEO headings etc
But now we need to look at what may have existed before in the way of penalty points that would cause the problems to be aggregated
I am sure many have a better handle on all of these but here goes with no particular order
1 duplicate page content
2 duplicate domains targetting same niche
3 Over Optimisation
4 above standard deviation link building
5 Phrase based matching and usage with 1 and 2
I wonder if a new thread could be started listing in much more detail the known or suspected possible penalties G is applying for webmasters to look at and spend some time deciding if possibly 2 or more of those could effect serps in thier area
The other thing worth noting is just because other sites have not been hit does not mean the penalties do not exist more likely G has not identified the penalties due to better covering of tracks by SEO's
| 12:59 pm on Feb 12, 2007 (gmt 0)|
I'm always of the logic that if Google were to continue on with this current new set of results, it's not the best that they could have pulled out of there hat as in the area of the SERP'S I monitor, sub-domain classifieds and other what I call "non-relevant" or secondary results appear like employment ads, sub-domain directories that are not getting clustered, did I mention classifieds? etc.
I can see things changing for alot of sites affected by this simply because Google's current index wouldn't be the final one IMO because it has too much crap. :-) Patience..
| 1:02 pm on Feb 12, 2007 (gmt 0)|
Tedster is (as usual IMO) talking sense.
It seems plausible to me that the "31 penalty" might be a result of a Hilltop-like analysis to pull out and promote the "best 30 results" as judged by peers (demoting other sites being a side effect of the basic rules of set theory--you can only have 30 sites in the first 30 listings).
Then the "31 - 999" penalties are simply a result of discounting signs of spam-level-optimization on phrases -- perhaps conjoined with incestuous linkage patterns (and yes, in this case NATURAL phrasing and NATURAL-looking inbound link optimization is the way out.)
The usual objection to this approach is "look at all the spam at the top of the lists now." But, remember, think about a search like "[resort city] hotels". There are, say, ten million search results for maybe a thousand real hotels and a few thousand legitimate news or review references -- that's 99.9% spam. And eliminating 90% of the spam -- a magnificent improvement, and well worth doing -- would leave the search results only 99% spam. And the 10% of spam that survived would seem to have been "promoted" by leaps and bounds.
I think BOTH of these phenomena will make more sense if you reverse the usual logic: look at the 30 "promoted sites" for external signs of authority that a robot might spot, ignoring all other sites as irrelevant -- but if a site has dropped further than 30 places, look at it as having exhibited signs of spam that a robot might spot, probably relating to specific phrase patterns typical of high-pressure marketing.
| 1:37 pm on Feb 12, 2007 (gmt 0)|
The problem is, in seven years I've never seen an instance where Google dumped on obvious authority sites so harshly and UNIFORMLY. Demoting sites is one thing, but sending them to number 900-980? Rather harsh, I would say. "Downright strange" could also be applied.
There is some reason why these sites are ALL going to this 900-980 range.
| 1:59 pm on Feb 12, 2007 (gmt 0)|
Looks to me like Google needs to get a handle on domain clustering for sub-domains. When 1 particular site dominates the SERP'S with more than 2 listings out of 10 for the same keyword there is definately a problem.
Now with this penalty or filteration that is going on, how many of the sites that got hit are linking out to the actual 'authority' site that they list on there particular page that is getting filtered? Does the actual site your linking to from your page rank for the terms you are targeting with your page?
Wouldn't suprise me if there going "why should we rank you, when the site that you are linking to actually ranks for the keywords your trying to hit with your page?'
Then again.. experience is telling me they won't settle on this current set of results. So I don't know. :-)
| 2:37 pm on Feb 12, 2007 (gmt 0)|
i can show you a search where two sites have 50% of the top 40 listings
| 3:02 pm on Feb 12, 2007 (gmt 0)|
"Authority" sites has a specific technical meaning to Google, which is not the same as the normal real-world usage -- although the Google meaning is related to a technical attempt to measure the online analogy of one real-world aspect of authoritativeness.
I think the usage in THIS forum is yet another step away from reality -- that is, what is called "authority" sites are sites from sources that in the real world, would never be called authorities, but simply ranked well under some prior Google ranking that included (among other things) Google's technical estimate of that one aspect of authoritativeness.
Google will ALWAYS be TRYING to dump -- HARD -- on sites that pass their technical tests but do not conform to real-world notions. There aren't ever any other kind of sites for them to dump. (The ones that never achieved faux-authority status can't be dumped.)
I'd propose a bold experiment -- and this may be far too radical for this forum. In analyzing the effect of Google changes, try to limit yourself to (1) sites not related to you or your clients, that you recognized as genuinely authoritative in a recognizeably real-world sense, (that were dumped) and (2) genuinely non-authoritative sites, related to you or your clients, that seem to rank higher than their sources warrant.
I'm suspecting the perspective will be so different that you'll wonder if you're in the same universe.
| 3:22 pm on Feb 12, 2007 (gmt 0)|
|annej - Are you ranking as high as before.... despite your changes. Also, are you getting the same indented page combinations? |
One is now second while before the penalty it was first, no indentations. But I can live with that. The other is still shaky. It is indented and on some data centers another page from my site is showing. I think I will put the phrase back in the title and see if that helps or hurts.
|how do you navigate your users to those pages without linking with the correct anchor? |
One was easy as it is a war that is know by more than one name. The fix on the one that is completely back is kind of awkward. I describe where it is from in the navigation rather than it's name.
BTW it just occurred to me that a lot of authority sites have simply not noticed they are missing pages. It was somewhat by chance that I did. If a site has not lost enough pages to be noticeable in the amount of traffic they are getting I doubt they would look for missing pages? I suspect this is more widespread than it seems.
| 4:02 pm on Feb 12, 2007 (gmt 0)|
|sites not related to you or your clients, that you recognized as genuinely authoritative in a recognizeably real-world sense, (that were dumped) |
That's exactly the sort of sites I'm referring to --sites that are authority sites based on their "respect" factor (i.e. linkage from other respected authority sites), NOT sites that were previously well-ranking linkwhor*s.
|I'm suspecting the perspective will be so different that you'll wonder if you're in the same universe |
And I'm suspecting that you haven't done enough research to understand what has occurred, which in itself is no crime, since until last week I didn't understand it (or even notice it) either until I checked out what other people were talking about.
|BTW it just occurred to me that a lot of authority sites have simply not noticed they are missing pages. It was somewhat by chance that I did. |
Exact same situation here. I didn't realize what had happened until I started researching why Adsense $$$ went south. Only people who previously were top ranked seemed to have notice the change at this point. Lots of others will simply assume that it's typical Google serp changes. Big mistake IMO. What initially threw me is that my big money pages are still ranking number 1 (with Site links intact). It was the Adsense drop that indicated a problem. You really do have to be looking for it.
| 4:08 pm on Feb 12, 2007 (gmt 0)|
Whatever you think is causing the problem doesn't explain the fact that some days works fine and others don't.
| 7:47 pm on Feb 12, 2007 (gmt 0)|
I think that the pages that are flitting in and out are on the edge of that fine line I keep talking about. There are other pages that have been out for the duration and who knows how many pages are just a phrase or a lost inbound link away from plunging into the 900s or whatever.
Since it's hard to know what phrases are problem phrases I think the only solution for now is to go through and decrease keyword density while still trying to have navigation and text that makes sense. (I'll admit I went a bit overboard with the experiment I reported above but I wouldn't do that with most of my site.)
| 7:56 pm on Feb 12, 2007 (gmt 0)|
Anyone think redundancies on sites could be playing a part?
< continued here: [webmasterworld.com...] >
[edited by: tedster at 3:40 am (utc) on Feb. 13, 2007]
| This 175 message thread spans 6 pages: < < 175 ( 1 2 3 4 5  ) |