Forum Moderators: Robert Charlton & goodroi
In further research last night I (think I) came to the conclusion that the pages that we lost were over optimizised. But in what way and where are the questions. I could only compare who is listed, and who is not.
On one of my main search terms, where last week I was numbers 2 & 3, I am now # 59. (other sections are worse, but sticking to this problem for my research)The result should be one of my deeper pages, to the actual related page of the site. But what appears here is my main index page. The page that should be here I have not found yet.
So if i am thinking i am over optimized, i want to see what others on the google results page are having success with. Guess what? They are way over optimized. But this is the difference that I can see so far. In the unlinked content on the page, they repeat the search term over and over - key phrase density probably twice as much as i have, maybe more.
But I think the difference is that where I repeat the term, it is anchor text internal links. I think I repeat the phrase in the anchor text equal to or more than I do in the content on the page itself. It does not appear spammy, it is basically the navigation links that I speak of.
So, is it possible that google is looking at the anchor text links and weighing those phrases more than what is in the content itself and considering repeating phrases within the link as spam? Could it be the anchor text density verses the density of the content itself?
Am I making any sense to anyone?
[edited by: tedster at 9:20 pm (utc) on Feb. 27, 2008]
2. Affiliate links
not for me
3. Major html errors
nope
4. Very little content on homepage
maybe - i am working on increasing that right now
5. HTML outweighs the text
no
6. Real content of the site takes 2-3 clicks to get to
it does take a few clicks to ulimately get to the buy page of a specific item
7. Crawling issues!
no problems - confirmed with xenu and google sitemaps
8. Links to bad neighbor hoods
no
9. Involvement of paid links
no
10. Link exchanges
yes - all on topic, only 2 pages with links, maybe 30 on each page
3. Major html errors > I just learned I misused H tags
4. Very little content on homepage > it is a subdirectory index page that was penalized and it had very short blurbs with the links.
10. Link exchanges > yes, old ones gathered over the years but not related to the subdirectory that has the problem.
11. Keyword phrases in intersite anchor text are repeated in page titles, h1 and metadescriptions > not all but I had the title of each article ad as the anchor text on the contents page. I just changed that.
The way I see it is that we need to do what we can to differentiate our pages from scrapers. It's a pain but may be the only way to protect our sites and individual pages.
Those H tags will kill a site......
As Marcia said, not if used properly. Check the source code on Matt Cutts' blog. They haven't hurt Matt. They haven't hurt the sites I've used them on.
Yesterday, I posted some thoughts on Hx elements in this thread...
Large h1 - h2 Headings
[webmasterworld.com...]
...addressing a different question, but the function of Hx elements is mentioned. If they were misused, I think Google would tend to disregard them rather than penalize... just a guess. Also, I don't think onpage over-optimization would produce the kind of sharp drop-off everybody's discussing, which sounds like it's much more linking or dupe-content related... and not in all cases permanent.
[edited by: Robert_Charlton at 8:26 pm (utc) on Jan. 26, 2007]
OTOH they sure weren't helping me. If all my misused H tags have been ignored and perhaps even important links were ignored I have weakened the sites strength in the serps.
This penalty has sure made me take a new look at my site.
I was just trying to do some reverse research into this, and instead of looking at why my (our) sites are penalized, I began to look at the coding of other sites that rank in the top ten SERP for various keyword phrases. I chose four different keywords/phrases popular in my industry and looked at the coding for the top 10 Google SERP positions.
I know that looking at only fourty sites is not very scientific, but in the top 10 results for all four key phrases, the sites either not have <h2> tags, or if they did, they very rarely included the keywords within the tag. [Note, some of these DID have the key phrase in the title, description, keyword and <H1> tags.]
Is anyone else seeing this for top SERP results in their particular areas?
It seems to me that there is too much movement in the results for me to try and justify any further changes. I think everything will comeback next week. If things havene't shook out by then, well then I will start to get concerned again.
Does anyone else have data tables on their penalized pages... Where repitition is basically inevitable?
What's interesting is how "page" specific it is. You can see multiple pages on the same site seem to be doing the same as each other, and some will get hit with the penalty and some won't.
I agree and disagree. It appeared to be page specific to me at first, but then I realized it is more than just page specific but page and query specific. Here is an example:
I have some pages that are shoved to the end of the results, but only when searching for that page using the same phrase as the title and H1 on the page. If a different query matches that page it isn't always pushed back in the results.
Likewise I still have pages which you can search for the page using the exact title and exact h1 which still rank extremely well. The only thing I noticed about the pages which survive is that they are slightly less searched for terms.
So when searching using phrases that are exact matches for the page based on title and H1 I can see that many, but not all of my pages are pushed back in the results. While other queries that match data on the page leave that page where it has always been for that query.
Therefore here is the formula that represents my site. If the page title and the H1 tag on the page (and probably internal anchor text) are the same, AND it is a slightly competitive search within my niche the page has been pushed way back in the results.
If the search query isn't an exact match and/or it is a slightly less common search phrase within my niche then the page has a better (but not perfect) chance of surviving.
I have some pages that are shoved to the end of the results, but only when searching for that page using the same phrase as the title and H1 on the page.
This is how searches for pages that have been duped and filtered will often behave. There are a number of interdependent factors (like inbound link text, how the page is optimized, and what kind of inbounds the dupe page has), so one doesn't always see exactly the same pattern.
Now, forgive me if I've missed this somewhere in this succinct and easy-to-read discussion ;) , but has anyone tried pasting &filter=0 onto the end of the search query (onto url of the first serps page that comes up for the query), and, if so, have the previous rankings returned?
Here's what I did... Nothing. I suffered from paralysis by analysis and was just getting ready to make some changes this weekend finally, and I'm sort of out. But it's still not a happy ending... yet.
I'm in one of the 5 most competitive niches there are. Lets say that the page that was hit was about "Red Widgets". I was in 950 land for the past 3 weeks for "red widget", "(synonym of red) widget", "red widget price" and "(synonym of red) widget price".
Previously I had been top 10-20 for all of these terms. Today, I am at result 250 for both "red widget" and "(synonym of red) widget" and still at 950 for "red widget" price" and "(synonym of red) widget price"
So, now the question is...are the phrases for which I am on parole and finally out of jail going to return near the top without any fixing or not? Second question is, are the phrases that remain at 950 going to stay there forever until I fix something? I feel like I'm playing chicken with google waiting to see who is going to make the first move. Don't know, but simply for the sake of watching, I'm going to wait one more cache before I make any changes to see if those 250 results start to climb on their own or will need some additional help, and what happens to the 950's that remain behind bars.
No changes have been made to the site. Recovered somewhat in early Jan to #30-50 for main keywords but appears to be stuck there, so they are getting ready to make some changes now. Less repetition of anchor text mostly.
We fell out 0n 17 Dec....
'in' times [GMT] in January:
17:00 3rd - 17:00 4th
23:00 6th - 17:00 9th
17:00 10th - 17:00 11th
17:00 14th - 23:00 14th
~4:00 18th - ~4:00 20th
~20:00 23rd - ~02:00 25th
We have had some success in getting pages back in but you fix one problem and create another. It is a multi layered issue, probably with different solutions for each site. We have managed to get some pages to rank top for some search terms whilst the site is going in and out. In order to get the whole site out requires a series of changes and time for all these to get processed. Most of the problems seem to be executed at run time, although offline there must be flags involved as well. The cache of our pages will show changes but older data is being used in the ranking. The dates of the cache are often misleading, we have done a few experiments with noindex and nofollow but despite new cache dates on pages the changes are not in sychronisation. This suggests a degree of offline analysis.
Areas that we have had to tackle in order to get a page to move from 950 to 20th in a predictable way (despite untouched pages fluctuating between 950 and 400+) are:
1) Local rank issues. Very clear effect was seen and cured.
2) Anchor text issues. Combined with local rank it had a devastating effect.
3) Trustworthyness of link. Purely on page factor that effects how the link/anchor text is treated plus (possibly) words around the anchor text. There is a lot of 'ignoring' going on if onpage factors offend.
4) Page clustering and 'similarity'. Ruthless ignoring of pages deemed similar and potential spamming of a phrase. Two pages unrelated to each other but in combination covering a search term in a comprehensive way will do well. This is done mainly at run time and the &filter=0 helps identify 'clusters'. Some degree of offline analysis is evidently done but the search phrase effects the application of the filter, although adding &filter=0 will not change the effect. The offline analysis has to be done before changes will be seen in the way the clustering has been effected.... this leads on to a different ranking behaviour for the page.
I think Matt was telling the truth when he said at Christmas that no new algos were present. This 950 episode is all about a combination of existing algos being tweeked to work together in a more dramatic way. Some sites pop in and out and at different times. Because the potential problems are multi layered, a slight change in one filter or a slight change in the offline analysis will let borderline sites back in or out, whilst other sites with a combination of problems will stay out despite one issue being solved. Some sites only have a section effected, which adds credence to the clustering theory.
I always like to see the motive google has for doing something. In the 950 penalty each effect mentioned above appears to be focused on the trustworthyness of a page. Offline the similarity of pages is crudely flagged but run time determines whether the 950 effect is applied according to the search phrase. The offline analysis is probably based essentially on duplictation and quality of links going off the page. Any suspicion in these areas is flagged. At run time, these suspicions can be overridden if local rank is satisfied and clustering is not causing a problem. However, portions of the pages content may be being ignored due to onpage factors. For sites (I don't think this is a page issue) that has the offline flag, the effects of local rank and clustering creates an on/off switch... you either rank well or not at all. For sites that are not flagged, clustering and local rank has a more linear effect on the ranking.
Because there are so many potential issues there is no magic solution for all sites. We had a local rank problem which had to be cured before clustering experiments had any effect at all. The waters are muddied further by the fact that other sites, subject to the same local rank algo, will behave differently because they have not got a flag due to clustering or link analysis. Hence people see sites ranking well with keyword stuffing and others rank well for minimal keyword presence. The conclusions can be very confusing because the treatment, for instance, of word density is effected by the way you link to another site etc. etc. In otherwords, one element of seo can effect the way other elements are treated and this can happen within different pages of the same site. A page can rank well for one term and not another.... because of its clustering relationship with other pages in the site and the subsequent way the onpage factors are valued.
Now add the 100+ other factors which change all the time and colour all experiments and conclusions.... sigh. Seo just got a lot harder ;)
Of the missing article pages one of them seems to flit in and out. It's the one with the lesser known keyword. It must be right on the edge of the penalty.
I had one other contents page that was much like the one that was penalized but much shorter. So perhaps it was something about the number of links with the exact text in the anchor as the H title of each article it linked to. My other contents pages to subsections had much more text on them so perhaps that is what protected them.
So number of links with identical text to the article names seems to be one factor.
The amount of text on the contents pages seems to be another.
But that doesn't account for the still missing article pages that look identical to all the other article pages that are doing just fine. The only thing I can think of is when I compare the article page that has been flitting in and out and one that remains solidly out is that the one solidly out is has a much more common key phrase, 'famous war widgets'. The flitting page is about 'relatively unknown widgets'.
So here is a theory. Are they noticing they are penalizing a lot of niche pages so are experimenting with taking some less known phrases out of the mix?
MHes, I too would like to know more about the LocalRank problem. I thought there wasn't much we could do about LocalRank.
And, some of the pages at the bottom of the results have the "More results from example.com" under them. I thought this was one of Google's Signs of Quality, and was somewhat difficult to get. The keyphrase I searched is somewhat competitive for the niche the site is in.