Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's Site: Operator - Digging Deeper Returns More URLs

         

tedster

4:34 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This week I did some deep digging into a relatively small website and learned something new about the site: operator in Google.

Using site:example.com there were "about 683 pages." But clicking through to the last page, there were only "381 pages" reported. Many of us have been there, I know. Even after clicking on the "omitted results" link, we get nowhere near the total we hoped for, or indeed that seemed to promised on page 1.

So having collected those 381 urls, I was a bit frustrated -- 683 was a lot closer to the reality of the website that I knew. So I decided to use the site: operator directory by directory -- and that way I actually got Google to report almost every one of the original 683 urls!

To put a fine point on this, when I used the query site:example.com, there were only 181 urls returned from directory-a. But when I used site:example.com/directory-a/ 341 urls were returned - an additional 160 urls all in directory-a.

By doing this for every directory in the domain, I managed to find almost all the "missing" urls. Some of them even had decent PR and backlinks. It was good to know that they were really in the index. Two of them even show up in AOL Search, so I assume they must be in the regular index.

johnlim9988

5:11 am on Feb 29, 2008 (gmt 0)

10+ Year Member



I also use site:mydomain.com to check then google show that Results 1 - 10 of about 65,000 from mydomain.com

then I click to see the last page, but the last page is page 55,
Results 541 - 547 of about 65,000 from mydomain.com

what happened? what even I cannot see the page 99? also not see any omitted result.....

tedster

5:24 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Exactly. So try what I wrote about: add the directories to your search --

site:example.com/directory1/
site:example.com/directory2/
...and so on.

Or dig even deeper with site:example.com/directory1/subdirectory1/

You'll collect more data to do your analysis.

steveb

7:09 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This has been the case for a long time. It makes you scratch your head, and then you have to monotonously see which ones are excluded the first time around, and then you have to see if the pages show signs of trouble (grey bar, bad ranking...). Pretty tiresome when you know the better alternative would be google doing it correctly when they can.

oodlum

3:51 pm on Feb 29, 2008 (gmt 0)

10+ Year Member



That's really handy - thanks Ted! I just found a bunch of pages I didn't think were indexed yet.

doughayman

3:59 pm on Feb 29, 2008 (gmt 0)

10+ Year Member



Very, very useful, Tedster. I have been baffled many times by the "site:" operator being applied to the root directory of a domain, as this number seems to fluctuate by the minute, and there was always an issue of inclusion or not. This requires a little bit more work, but is extremely useful. Thanks so much !

rekitty

4:23 pm on Feb 29, 2008 (gmt 0)

10+ Year Member



Thanks tedster.

One question. Do you think this query shows that directories pages in the main index?

site:example.com/directory1/*

While this query shows the number in the main and supplemental indexes combined?

site:example.com/directory1/

tedster

4:30 pm on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This has been the case for a long time.

I thought it might have been, but I so rarely work on a site that's under 1,000 urls that I couldn't be sure. I figured if I wasn't sure, it was worth a thread.

Do you think this query shows that directories pages in the main index?

site:example.com/directory1/*

When I was doing this particular study, the /* hack was giving me the exact same results as the regular search. So I went to AOL to grab the main index urls (and that is tedious!)

steveb

12:04 am on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It IS much more significant now because Google is deliberately not showing supplmentals sometimes, but in a variety of different and contradictory ways. So, the page search via directory is much more valuable now than say a year ago.

All this stuff relates to the false statements Google employees made about supplemental handling a few months ago. "Now you see 'em, now you don't" is the way it works, but even finding them via that site search doesn't mean they can be found for queries.

Receptional Andy

12:38 am on Mar 1, 2008 (gmt 0)



This has been around for some time. I attributed it to an extension of same effect as advanced searches returning contradictory results, and that's been around (and frustrating me) for years!

It's hard to come up with postable examples, but here's one that works currently:

[google.com...]

No www, eh?

tedster

12:51 am on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All this stuff relates to the false statements Google employees made about supplemental handling a few months ago.

You know, I've read those statements many times - and I suspect now that they are technically true, but misleading. I do think that Google changed something so that now the Supplemental Index is available and "searched on" for every query rather than just for the obscure queries, as was previously the case. That much, I think, is literally true.

And yet, you can search on unique phrases from within the content of a Supplemental URL and see that Google does not return ANY results at all. The Supplemental URL is not returned -- nothing is returned. Now why not, when the unique phrase is sitting right there, observable in the cached page?

My theory is that there's a shortfall in how those Supplemental URLs are tagged and stored for search retrieval in the first place. When the collected data is sharded, and those shards are tagged and stored for retrieval, a Supplemental URL gets an incomplete tagging compared to a URL in the regular index. That's how I see it right now.

So it may be literally true that every search now hits the tags for the Supplemental Index. But since there are fewer tags created in the first place -- call it an "incomplete indexing" if you will, just as was always the case -- a lot of the Supplemental content is still not really, truly accessible via search.

So did Google achieve a technical improvement? Yes, and I'm sure it took a major bit of programming to make that happen. Does this mean a Supplemental URL is now on a totally equal footing with a URL in the main index? No.

steveb

3:35 am on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Technical improvement? Definitely not. In fact it is another drastic downgrading.

And no, the statements were not technically true, unless you presume they were NOT refereing to google search, which is a silly presumption.

They may now look at the pages and choose not to display them, but "look at and decide not to" was not the emphasis. The emphasis was on pages appearing in the results.

What Google has done has made it so supplementals appear for less results than before.

At least before if your page was the only one that said something, it would be returned. Now, they will often not return a page under any circumstances.

Put simply, they have applied greater technology to display less results.

And put more simply, a page being supplemental now is much, much worse off than previously.

tedster

4:05 am on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I should have called it a "technical innovation" rather than an improvement. I'm sure the large-scale database engineers were proud of what they achieved from an academic viewpoint. But the end result is still frustrating and the Supplemental Index is getting more impenetrable.

whatson

11:56 pm on Apr 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So the results not shown are all in the supplemental index? Will they eventually come out? I do not understand why my pages would be in there as they all have unique, useful content.

g1smd

12:19 am on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What is weird is seeing "1 to 10 of 50" and then clicking on the "omitted results" link and seeing "1 to 10 of 400", the extra count not appearing the original search. It used to give roughly the same number (the higher number) for both searches but that behaviour ended a few months ago.

What I have been seeing a lot of is where a site:domain.com keyword search returns a few results that are NOT from domain.com, though the problem seems to be going away again as of recent days.

tedster

2:11 am on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



they all have unique, useful content.

How about PageRank, link juice (even from your own site), unique title elements and unique meta descriptions?

Google is really shuffling of a lot of good content into supplemental in recent times, and it's up to use to hsend strong signals for the content we hope will rank.