You might be on to something there. Remember the old days when there was a single, clearly labeled, Supplemental Index? And then it went away - with many of us suspecting that several different kinds of "supplemental" partitions had been introduced into Google's data storage?
So what if... low priority URLs are tucked away in second or third level storage somewhere and the regular site: operator doesn't access that data. But make the search more complex by asking for site:example.com/directory/ and voila! you force the look-up to go into the dark corners of the data center.
It looks likely that the site:tool has been accurate all along and that it is a potential good health checker to see what can rank and not rank more easily at a glance.
Is this the only way to check if a site's URL's are in the " secondary " index ? ( I don't think it's known as the supplemental index anymore )?
I think there's more than one "secondary" index at this point, and I don't know of any accurate way to probe into them - wish I did.
Supplementary index for sure, I think this is also tied into mayday. Either the supplementary index is being used as a test bed for caffine before going live or the mayday update was in fact a tweak to the minimum link juice a page requires before it is in the main index. It is likely your pages have gone to the supplemental index for reevaluation and will be added back to the main index shortly.
WebmasterWorld has around a million pages in this supplementery index, it is easy to 'probe' site:domain.com will return main index pages. site:domain.com -lasdkjf (or any spelling mistake) will confuse google so it looks in the supplementary index and will return total pages it has for you. A simple trick but seems to be accurate.
I think what happened with mayday is the the minimum required link juice to enter into the main index was tweaked upwards, therfor the need for deeplinking to create link juice further down into the body of a site has increased substantially the link juice ripples are not strong enough from the home page. It also means for larger sites with lots of pages the link juice is getting spread so thin amongst lots of pages these are now below the main index threshold.
|It also means for larger sites with lots of pages the link juice is getting spread so thin amongst lots of pages these are now below the main index threshold. |
If that is the case, now more than ever is the time for e-commerce sites to link directly from the main body of text on their homepage to their items which generate the highest revenue. I would even link from category pages that have a lot of link juice but may not be related to the product.
To validate the -wording in the string, I have used:
for several years with great results. By great, I mean:
site:examplesite.tld = 150 results
-adfhadfhadha site:examplesite.tld = 400 results (Shows supplemental results count)
Also - to ask a semi-relevant question about site:command.
Is there a specific sort order used when the results are displayed? I see the list of results, but does anyone know in what order are they displayed?
|Is there a specific sort order used when the results are displayed? |
I can't say for sure, but the folders with the strongest link juice distribution seem to be displayed first and it's no concidence i suppose that those pages rank better.
Sorry for the bump but couldn't find a more upto date topic.
Do you guys think think there is relationship with the authority of a page being listed using the site: operator and potential penalties?
Date: Jan 1st 2010
Site operator homepage position: 1
Page one rankings: 10
Date: Feb 1st 2010
Site operator homepage position: 10
Page one rankings: 2
So my question really is, if a page loses its authority through a penalty do you think it will loses prominence when using the site: operator?
results from site: operator have decreased for almost all major sites? No?
one site had millions of pages returned from site: operator only 1 month back, now i only get few hundred thousands
It looks to me like the order of the pages returned by the site: operator is also being influenced by personalized search, so if your main page isn't first, try logging out or adding &pws=0 to the URL.
Nope. Not signed in. I honestly think this is a reflection of authority.
|relationship with the authority of a page being listed using the site: operator and potential penalties? |
If the site: operator is indicating the "buckets" of authority at each folder level, then yes , it follows that sites with less authority are more susceptable to penalties than those with strong authority.
If the site: is only showing a small number of URL's at the root level over the total site URL's , then I would be doing all possible to distribute stronger link juice to deeper areas of the folder structure .
If you do this, i think site: will report more URL's at the root level as an indication of improved authority.
Just a hunch though ....
I see no evidence of any patterns that I would depend on for analysis, no correlations that hold up across different sites, nothing at all. I wish it was different - I really do. I've followed up on several hunches, but they just didn't hold up.
So now the idea of using the site operator for diagnostic work feels like underwater skateboarding to me.
I see different stuff every time I look! I'm wondering if the canonical tag is messing with it too.
Sometimes I wonder, Does site:example.com -stopword (i.e. site:example.com -the) returns all excluded (supplemental) results? How can we predict accuracy?
@suratmedia Yes, it definitely used to be something like that. However, I use -asdf or some other nonsense string that won't show ever up in the content ever. Using -the or another real stop word can strongly affect the results.
The problem is that the number of results is slightly different for different nonsense strings. So I'm not sure what we're looking at any more.
@netmeg I get different results too. I think that - in addition to this supplementary index (kudos on the -jidjwaos tip by the way, I just tried it and it works very well!), the Google results are also spread out over more than one database.
And it seems like each database isn't fully in sync.
I've got a fairly new site (Google's known about it for 19 days now) and whenever I do a site: search (without the -typo tag), the number of pages returned varies loads.
So yeah, I wouldn't worry about it. Just seems like Google are splitting things up more (perhaps since it's quicker to perform searches; i.e. Google were planning ahead with regards to Google Instant?) and as a result, inconsistent results are more likely.