Welcome to WebmasterWorld Guest from 188.8.131.52
I get inconsistancies or restricted information using or/showing :
-site:mywebsite.com i get 130,000 pages [ approx correct ]
-site:mywebsite.com/Widget1/ i get 1 page [ i should see 2,500 ]
-the above via Google Sitemaps i get something similar.
-the Google toolbar and selecting the cache of a particular page not occuring in the site:mywebsite.com search i find pages cached and serving results.
-results for our sites in Google's SERP's i find pages listed with "caches" and sometimes i find them with no caches.
Are these bugs or is Google holding back info from webmasters?
Reason: supplementals don't show for directory level site: searches.
- site:example.com -inurl:www returning results both www and non-www
I have no idea on this one. Checked whether it's just the "preferred domain" setting renaming all non-www's as www's for the sake of displaying to users, but no. These are both www and non-www URLs. ( for some didn't even exist as non-www, ever, also cache seems to have a better lead on whether Gbot saw the address with the subdomain or not. It's a mixture of both, with dates showing correctly, no non-www's cached after the redirect )
- site:example.com -inurl:www returning results 1 to 100 of about 1000
Which is incorrect, and i too have though about what you said. For when i checked manually, it adds up to 1600+ ... what made this suspicious was that "exactly as many as we can display" cute little natural number :P
Which stays the same btw if i apply a filter for a directory which would produce about 300+ results so... ;)
- ditto on the cache problems.
We too see a lot of cases where the cache is there, and the page doesn't come up for certain site: searches, not even as supplemental. However when doing a search that's unique to the page they're in the index. All are shown supplemental of course.
- quadripple-tripple-double snippets taking over once again. With pages cached in mid november. Before, only pages that haven't been crawled or cached recently did this. Now newly cached URLs show them all the same. All are supplemental. Checked whether this could be an indication of something but couldn't figure it out. Apart that only supplementals show such descriptions.
- site: shows 1570 pages with strict filtering.
- site: shows 2700 pages with moderate filtering.
No idea why. Especially because right now from the first 540+ they're all supplemental ( low PR, innermost URLs ) thus these are the album pages, meaning the text is about the place, subject, theme the pic was taken of. And since there are no words indicating anything that'd need to be filtered... it made me wonder. There are no 1200+ photos that'd be over PG :P THIS is probably a problem only for us. There has to be a word or words that trigger this. We'll look once again.
And the reason why i'm looking hard enough to notice these things:
Our site is shrinking in the index. Since it's redirect to www only it has climbed to 3300+ indexed, then started to fall back gradually. Two weeks ago it was 2200+, last week it was 1700+, now it's 1500+. Mostly it's the supplementals that are disappearing, but about 400 pages have been marked supplemental the last couple of days. They have very low PR though so... this was more or less predictable.But supplementals disappearing was not.
All in all, i think some of these are in no doubt bugs. The rest are... up to the paranoia of the webmaster ;) ( my guess is that these are all bugs )
Even if these would be intentional restrictions, as far as i recall it was G that started using the site: command... or any command at all, which i think many people were thankful for. If they are to limit its use to "what's indexed PROPERLY" or something that might be a shame, but it's their call. Although if/when this happens i'd like to know, so i can STOP looking at it in any other way. ( did you ask because of this? ) You know... the usual new definition on g.com... just like the one for link: has been rephrased recently.
edit reasons: forgot to take pills against posting off-topic all the time