High Supplemental Results but unique content

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

High Supplemental Results but unique content

Mobillica

3:00 pm on Aug 19, 2010 (gmt 0)

Ok, my great SEO clearup of my site continutes.

I have just discovered that my site has a very large percentage of pages in the supplemental index.

The majority of these pages have original, unique content and are google friendly.

I think the main reason these are in the supplemental index is that link juice is not filtering down to these pages, as they are folder1/folder2/folder3/mypage

I dont want to delete them, or remove them from the index as in my view, google should eat these pages up and it should reduce the amount of pages in the supp.

What im thinking is I create a few additional pages with original content only one click off my homepage with the URLs (around 25-35) of these quality pages on these pages.

Would that solve my probem?

tedster

3:29 pm on Aug 19, 2010 (gmt 0)

I'd say that's a good idea. And if you can get even one solid backlink for that "directory of deep content" page, you should be doing fine.

Mobillica

3:49 pm on Aug 19, 2010 (gmt 0)

Thanks again Tedster. i'll go ahead and do that.

I think this high amount of content in the supp index may be one of the main reasons why my site is not ranking as well as it has.

I'll keep you posted as to how I get on. I'm quite optimistic for a change :)

1script

5:00 pm on Aug 19, 2010 (gmt 0)

I don't want to hijack an otherwise great thread, but it may be relevant to the topic: how do you reliably identify the "percentage of pages in the supplemental index"?

MrFewkes

8:02 pm on Aug 19, 2010 (gmt 0)

Hi - I need to know this aswell - as 1script asked.

MrFewkes

8:04 pm on Aug 19, 2010 (gmt 0)

I'd say your new pages which you are about to create will also get supplemented. Unless you get more links....

tedster

8:20 pm on Aug 19, 2010 (gmt 0)

how do you reliably identify the "percentage of pages in the supplemental index"?

IMO, you cannot, you can only ballpark it, if you're lucky.

Just discovering the total number of indexed pages - that alone is not an easy job. And then there's the fact there is almost definitely more than one database partition going on - rather than THE supplemental index of days gone by.

1script

8:43 pm on Aug 19, 2010 (gmt 0)

IMO, you cannot, you can only ballpark it, if you're lucky.

I'd be honest, I don't know how to even ballpark it. Is there a thread that's recent enough that discusses that? It used to be that you run a search for site:example.com +example (without .com at the end) and then see how many more results show up when you hit "Show all results" link. Now I'm seeing "supplemental" being referred to pages that show up in search results but have no cache and show "this term was only in links pointing to this page" or some such nonsense.

How far am I off base with the current definition of "supplemental index"?

tedster

8:48 pm on Aug 19, 2010 (gmt 0)

I think it's a dead word - there's no meaningful definition any more. Even the straight site: operator is a swamp with no precise uise as far as I'm concerned.

TheMadScientist

8:55 pm on Aug 19, 2010 (gmt 0)

Personally, I've always thought it was much ado about nothing... The supplemental results were 'query specific' meaning when a page was returned as a supplemental result for a specific query it was a supplement to that query, but I cannot remember reading anywhere that a supplemental result for one query was a supplemental result for all queries...

They supplemented the results (still do, but unlabeled IMO) at times with pages that weren't the main result, but that does (did) not necessarily mean the same page was only supplemental for each and every query it was returned for, AFIAK. It may have been, or there may have been some queries it was in the main results for and it was used to supplement 'related' results.

1script

9:12 pm on Aug 19, 2010 (gmt 0)

Well, with the ambiguous definition (or complete lack thereof) of "supplemental index" I think this discussion will benefit from OP elaborating on the metric used to conclude that

I have just discovered that my site has a very large percentage of pages in the supplemental index.

In fact, I'm very much interested to learn what are the metrics people use to assess a site's standing with Google. What I mean is that it appears that there is always a combination of "individual page rank" and "site rank" that's at play when a SERP is created. Having a bad "site rank" will adversely affect all pages at the same time (as in penlaty=site rank 0 ) and it would be important to know if the site as a whole is doing better or worse after, say, a site-wide change is implemented.

We have

* site: operator (unreliable as it is)
* G*bot activity (pages downloaded per day)
* GWT sitemap indexed page count
* homepage cache date

what else can be considered a valid site-wide metric?

tedster

11:17 pm on Aug 19, 2010 (gmt 0)

I focus on which URLs get search traffic. It does no good if a URL is crawled but not indexed, and no good if it's indexed but never gets search traffic.

I find this is a very practical metric - and by definition it is not subject to the vagueness and windstorms that periodically blow through the rest of the metrics and reports numbers. Those can still be used to gather diagnositc hints, do some troubleshooting and so on. But traffic needs to be my Google metric of choice. Couple traffic with conversions and I've got something in my hand that I can work for results with.

TheMadScientist

11:54 pm on Aug 19, 2010 (gmt 0)

I focus on which URLs get search traffic.

Yeah, I don't look at much else any more either...

I do have a script that tracks bots though, so I can see spidering of individual pages in real time and make sure they're being crawled that way. I can also see page crawl frequency and 'most often crawled' pages using the same script. I've found it's much more efficient and effective to use my own scripts for tracking coupled with traffic as a 'gauge' than using the site: garbage or most of the other tools provided to 'give a glimpse' any more.

It doesn't really matter to me if it shows in the site: or not, because if it shows in the results it gets traffic and if not it doesn't, so what do I care if it shows when I search for site: or not? Personally, I don't care enough to even check, except for curiosity every 6 months or so, maybe...

1script

12:54 am on Aug 20, 2010 (gmt 0)

I focus on which URLs get search traffic.

Yeah, I don't look at much else any more either...

I guess, my question is born out of personal experience: I do also track performance of a few select individual pages that get enough traffic from G*. However, in my experience the trouble mostly comes from the "site-wide" part of the equation. One day you come to check the stats and most of your well-performing pages have stopped receiving any traffic from G*. After this happens, it does not make sense to look at them individually anymore because the timing clearly indicates that there is something wrong on the site-wide level.

Do you guys seriously look at a 1000-pages site as a collection of 1000 1-page sites? I guess individual pages provide a lot of info to look at but at the same time each individual data point is prone to fluctuations. Sometimes those flukes make the whole picture look like a bunch of noise.

So, do I hear that the frequency of G*bot visits to the site's homepage is no more important than that of any other page? Or a sharp drop in the daily number of collected pages from the whole site is irrelevant for as long as the select important pages are still visited with the same frequency? There must be a certain number of site-wide parameters worth tracking else you might miss a (site-wide) forest behind the (individual) trees...

TheMadScientist

1:01 am on Aug 20, 2010 (gmt 0)

There are a couple I work with over 10,000 pages and I tend to look at their stats as a 'group' and I can usually look at the top 500 pages visited 'at a glance' by keyword and know if the pages are doing what they should or not.

I don't look at them individually, but more as the forest itself, to use your example... If I see enough people 'throughout the forest' on a given day or time period I know it's healthy, but when people start disappearing from the forest, and all I see are trees I know it's time to start looking at things a bit closer and will usually pick some of the top 'low performers' and look more closely at those.

EG If most pages are 'hit and miss' for visits depending on search frequency, but some average 5 or 10 a day, maybe 15; not what the home page and top pages do, and not completely 'hit and miss' either... If traffic starts to drop off from the 'hit and miss' pages I start with the best performers from those and have a closer look.

My personal approach anyway...

So, do I hear that the frequency of G*bot visits to the site's homepage is no more important than that of any other page?

Depends on your goals for your site, IMO.

Or a sharp drop in the daily number of collected pages from the whole site is irrelevant for as long as the select important pages are still visited with the same frequency?

If you're talking about the site: operator showing a sharp drop, one of the reasons I stopped looking is I used to and I used to 'freak out' every time I saw my page count drop significantly, but every time I checked traffic it was normal, and one day I finally thought how silly it was to worry about how many pages they had for the site when traffic was steady to increasing...

There were times it was high by about 10x the actual pages on the site... Traffic remained steady to increasing.

There were other times it was half the number of pages on the site... Traffic remained stead to increasing.

My goal is traffic not Google's displayed count of my pages being accurate, so I stopped looking. It was actually right after I started getting totally different results from 'site:example.com' and 'site:example.com example' (no quotes; site:example.com example showed 3x the pages site:example.com did)... That was it... It screamed to me 'the site: count is not accurate don't bother looking' and I took the hint and stopped looking, because traffic is what I'm after not trying to figure out how they've decided to display, round, fudge, estimate, guestimate or generate a 'dummy number' that's not really important to anyone except a webmaster anyway.

Do you really think they care if the site: displays an accurate number or do you think maybe they have more important things to do with their time than figure out why the estimate is off by a factor of 10 or even cut in half for some sites?

If it was me, that number would be FAR down on the list of priorities to fix, so far down it could literally be broken for years and I might not get to fixing it, because it doesn't seem like it would pay the bills very much better even as an exact number than it does when it's broken... Seriously, think about the site: operator from their side... What type of ROI is there for fixing it?

I would fix the calculator function or add new ones before I ever stressed on the inaccuracy of the site: operator.

1script

1:36 am on Aug 20, 2010 (gmt 0)

Or a sharp drop in the daily number of collected pages from the whole site is irrelevant for as long as the select important pages are still visited with the same frequency?

Oh, I'm with you on the site:operator. I do look at it but don't usually freak out. But there are times when it shows something really weird like I reported in this thread [webmasterworld.com], then I start scratchin'. Also, unfortunately, in my case I can see that both site: count and the actual traffic from G* are on the downswing and have been for almost 6 months now, so it's hard not to associate one with another.

In the quote above I actually meant number of pages of a site visited by G*bot per day. G*bot activity seems to me an important indicator of something (though more of it does NOT mean more G* traffic) and I seem to stress out when they get unusually low amount of pages. Not that it affects the traffic immediately ...

TheMadScientist

1:44 am on Aug 20, 2010 (gmt 0)

Ah... Yes, I definitely look at GBot visits...
By the day, the week, previous 30 days and previous year.

I go more by averages than anything else on the 'longer looks' but on the daily and weekly I can look at a very granular page level... It's definitely something I keep in mind and watch, and if I saw a pattern like yours the first thing I could probably do is check the site thoroughly for errors or crawlability issues, then I would probably go straight to deep link building.*

* Keep in mind, when I say check the site thoroughly, I mean I write the software generating the sites and I would:
Look at it line by line in the HTML source.
Validate it at the w3.org site.
Run Xenu and check all links internal and external.
Go line by line through the php.
Check server headers.
Fetch as GoogleBot.
Etc.

I would basically analyze the entire site line by line and rewrite the entire source code if I felt like I had to before I did anything else.

tedster

3:22 am on Aug 20, 2010 (gmt 0)

In various situations I would use any of those approaches and others - but my regular FOCUS is traffic.

And to bring the discussion back around to the opening post, the best way to get more unique content into the main index is to circulate more PageRank to those pages - and that might be direct backlinks or it might be internal linking or a combination of the two.

Mobillica

8:25 am on Aug 20, 2010 (gmt 0)

thanks for those replies.

Im using common sense when it comes to the supp index.

When we use the site operator tag, then click to the final pages google asks 'we have omitted some entries very similar to'- then we can repeat the process ( repeat the search with the omitted results included) so we can see those supplemtal pages or 'lesser quality' pages in the index.

To me if I have a large percentage of more pages in this 'supplental' index then this tells me that google has found a large number of pages that it does not value, be it for lack of PR or duplicate content or whatever, which cannot do a site's ranking any good at all.

I have a list of pages which has appeared in the index as supplemental and am going through each one, updating, changing title tags etc and getting internal links on these pages from a page of high PR.

scottsonline

3:43 pm on Aug 20, 2010 (gmt 0)

One of the huge changes we saw prior to Mayday/Mayday was our presumed hard indexed pages was halved overnight. We lost a lot of traffic at the same time.

IE, site:name returned 3500 pages
site:name/* returned 1500

Aftewards site:name returned 2700
site:name/* maybe 700

It's all a guess on the real number but what we've noticed is when the /* drops we get less junk traffic. I think the /* represents the hard indexed pages that will return something other than exact match searches. The amount of /* pages is directly related to traffic.

tedster

6:03 pm on Aug 20, 2010 (gmt 0)

I saw that too - as did many other webmasters. I assume that it was actually the new Caffeine infrastructure, rather than the Mayday algorithm change - they both happened pretty much at the same time.

As I see it, Webmaster Tools report data is a secondary level or layer of information. The WMT team needs to find ways to pull our reports from an infrastructure whose primary design is to facilitate the actual search results - and webmaster reports are not the core purpose of that infrastructure.

Historically, WMT data often blew up after Google made back end changes to improve their search offering. This also happens with the various special operators, too.