Forum Moderators: Robert Charlton & goodroi
An update on the site: operator
6/02/2006 07:28:00 PMPosted by Vanessa Fox, Google Engineering
We've fixed the issue with site: queries for domains with punctuation in them. We are still working on site: operator queries for domains that include a trailing slash at the end (such as site:www.example.com/ ), so you may get better results for now by omitting the trailing slash in your queries. The Index Stats page of Google Sitemaps no longer uses the trailing slash for its queries, so you should see correct results when using this page.
Thanks for your feedback and patience.
I checked our site by copying and pasting the link from the sitemaps page into our default google (64.233.161.104)and show 770 pages without the trailing slash, 779 with the slash. Isn't this just the opposite of what Vanessa said would happen?
To put it politely, i think nobody knows what's going on in certain areas of functionality, at Google or even out here on WebmasterWorld at times, which is a worry for trying to establish some stability - it could take ages.
I mean, look at this for example:
i do a site: query on our sitemap pages on one site and get 2 pages,
on another site with the same structure [ but unique content ] updated 2 months later i get a correct 28 pages. The first site is showing a drop in sitemap pages, but actually shows more pages cached across the site.
How can Google say that site map content which is used for assisting the bots through the site be excluded from caching [ or maybe I've missed a new innovation!?!? ]
We still see NO CHANGES..
No changes on our sites either in terms of number of pages showing up in the index of a generic google.com site: search.
All pages seem to be indexed when doing a site: search on various specific data centres, complete with unique descriptions for each page.
One thing I have noticed when using the generic google.com search as opposed to a specific DC, is that all pages which do show up display only the (same) general site description rather than the individual page description.
Has anyone else noticed this, or is it just me? :(
I also tried this one 216.239.59.104.
Some datacenters include n supplemental results to it, some don't..
True. And doesn't help one iota if Joe Public surfer is getting the datacentre which shows only n (very small number) of results.
Weird thing is, Googlebot's been all over my sites in the last few days like some 06/06/06 demon! I gave up trying to fathom Google's algos after Florida, and am just thankful we get most of our traffic from Yahoo and MSN.
So now, I get:
site:www.mysite.com -> 5 pages + lots of supplementals
site:www.mysite.com/ -> 4 pages + lots of supplementals
inurl:www.mysite.com -> 8 pages no supplementals
inurl:www.mysite.com/ -> 8 pages no supplementals
I don't have mysite.com except in urls from my website.
BUT, although there are now significantly more results, it's still showing the general site description for each result instead of the individual descriptions I'm seeing for each page if I use a specific datacentre.
No change so far on the other sites.
F-Rose - no, I don't use Google Sitemaps, but I did recently add my own sitemap component in an attempt to get the bot through the site. Given Googlebot has been through the site like a dose of salts since I added it, maybe that's why more results are showing in the index. Hopefully this improvement will roll out across my other sites.
Is it a regular site map which should be included on every site, or is it something else?
My sites run on Joomla, and I added a sitemap component made to run with it. No faffing around trying to design one myself, just uploaded and there it was done! I should perhaps have mentioned I did have a sitemap beforehand, but this one is much, much better.
Mainly did so for two reasons:
1. There was a debate going on speculating whether Google sitemaps were a good idea, and as I don't like giving G too much information, I preferred implementing the Joomla one,
2. There was another thread discussing disappearing pages being from predominantly level 3 and 4, so added site map so everything is spiderable at level 2.
I was apart of that discussion. Some webmasters who have created rather large site maps seem to have lost pages again. Since our site is much smaller than theirs (a couple thousand pages) we were able to split up the site maps for each of our 15 sections. The largest site map has between 100 and 200 links and seem to have no problem (knock on wood). The bulk of our site did get re-indexed and is now being crawled in full about every day. Rankings are slowly going back up. When our site does come back we may take those site maps and redo them so that there are no more than 100 links by drilling doen a level but for now they seem to be working to get our pages crawled frequently and indexed.
We do have a google xml site map that has been in place since it started. We never really seen a huge benefit crawl wise from it - never tried a plain text sitemap. New pages seem to be picked up a bit quicker without having to go through the site but that is about it. The onsite site maps seem to work alot better to get heavier crawling/re-crawling of existing pages. This is my take on it.
We were fortunate enough that google was typically crawling 3 levels. A 2 level site may have some problems since the site map would reside on level 2 and any links would be level 3 which Gbot may not be so quick to crawl but would not hurt to try (google has always recommended one anyway). Our site maps make sure that all links are on level 3 (again that will change at a later date) with a simple outline structure and no design elements.
The onsite site map has worked wonders so far (knock on wood again) and pages seem to be "sticking" now.
The discussion that was being referred to was using an on-site sitmap linked directly off the home page to where deep pages would reside on a higher level (level 3). For some it worked good and others not. Don't know the factors why though.
The discussion that was being referred to was using an on-site sitmap linked directly off the home page to where deep pages would reside on a higher level (level 3). For some it worked good and others not. Don't know the factors why though.
That's right, and that's what I did, except I linked the sitemap from the main menu so it's accessible from every page. The visitors seem to like it too ;) So with one click, they and bots can see a link to every article on the site.