Forum Moderators: Robert Charlton & goodroi
An update on the site: operator
6/02/2006 07:28:00 PMPosted by Vanessa Fox, Google Engineering
We've fixed the issue with site: queries for domains with punctuation in them. We are still working on site: operator queries for domains that include a trailing slash at the end (such as site:www.example.com/ ), so you may get better results for now by omitting the trailing slash in your queries. The Index Stats page of Google Sitemaps no longer uses the trailing slash for its queries, so you should see correct results when using this page.
Thanks for your feedback and patience.
That's right, and that's what I did, except I linked the sitemap from the main menu so it's accessible from every page. The visitors seem to like it too So with one click, they and bots can see a link to every article on the site.
I did ours in the footer since the site is pretty much self explanitory.
Got 15 site maps on the home page. Rather I call them tables of contents. These tables of contents are the contents of the 15 main categories created in an outline manner. In each main category I only link to the table of contents that outlines just that category to stay on theme. Those are placed in the footer of each page that resides in that category. So there is good coverage. Now the cool part is that when googlebot hits our home page it almost immediately hits those maps FIRST. Why I have no idea but then it goes crazy.
Without the on-site sitemap Google bot would just crawl hear and there in no particular order. Hitting one section the across to another and so on. Never did it really drill down into the site. With the site map it consumes huge chunks of content in each category. Pretty much the whole thing right down the line.
I distrust the accuracy of the site: command to identify this, but if it isn't working, per Vanessa Fox's information release, is their a corrolation to it and general indexing problems?
Have a look at this
[webmasterworld.com...]
...I believe there's a potential tie in with sitemaps not being spidered properly.
I can only comment on what's happened with my sites, but since pages started disappearing, I've delved deep into various forums on here that I wouldn't normally frequent, trying to find out what's happened and whether there's anything I can try to bring 'em back.
Should add that 1. these are old, well established sites, and 2. I'm seeing a lot of the same things happening now as happened after Florida.
Essentially, prior to this "update" or whatever it was, all pages were indexed and showed up on a site: search. Post catastrophe, the only pages which showed up were those which did appear on the site map I had, which wasn't a very good one.
So in my case, I'm wondering if the old site map (which went down only to level 2) worked too well? Did Google think that's all there was?
All I know is Googlebot has once again spidered right down to level 4 since I added the new sitemap, and that I'm now seeing (as of this morning) all pages except those most recently added showing in the site: search results on datacentre 66.102.9.104, which is the one that was only showing a few pages a couple of days ago.
EG. Showing supplementals again.
Not true of allinurl:domain.com searches - but Google have not said anything about that anyway.
I hope your right about ranking, but I have my doubts it's going to provide a significant boost. It only make sense that a problem in one section has an affect elsewhere, but I'm not sure how large it will be.
I've got the double whammy problem. Hypenated domain and a redirect to a trailing slash. After dropping a lot of pages, I'm back up to 920 of 1,200. Maybe this is as high as it's going to get. But what concerns me is that I've added like 50 pages in the last 2 months and the pages indexed seems stuck at 920.
I believe that many sites that did the non www to www redirects as instructed by Matt and many others have fallen into a "google trap". Many of the redirects were done to www.domain.com/ and not www.domain.com
Seems strange, but I think that the trailing / has something to do with it...... I know it shouldn't make any difference, but Google has been doing some strange things lately!
(don't everyone attack me....it's just my opinion; I don't have any hard evidence)
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com [NC]
RewriteRule (.*) [mydomain.com...] [R=301,L]
When I use a server header checker it appears to be redirecting fine. Here is what the server header check shows:
#1 Server Response: [mydomain.com...]
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Server: Apache/1.3.36 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a
Location: [mydomain.com...]
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: [mydomain.com...]
#2 Server Response: [mydomain.com...]
HTTP Status Code: HTTP/1.1 200 OK
Date: Fri, 09 Jun 2006 04:09:51 GMT
Server: Apache/1.3.36 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a
Anyone have any other thought on whether this is now the right or wrong way to do the 301?
but Google has been doing some strange things lately!
It certainly has; like showing in my logs as having spidered pages which don't exist, some of which have never existed, with some really weird URLs I've never had. Example: mysite.com/dfridkrnspv.html. None of my URLs are random letters, they're all like "www.mysite.com/blue-widgets/"
Every single adsense site is at least cached in googles index. Thoughts?
Interesting. That answers a question I'd wondered about. Presumably Google wouldn't shoot itself in both feet by entirely removing sites that earn it money, although it's having a damned good try. :(
Were the cached sites you looked at displaying adsense on their pages? I ask, because after more digging - including actually clicking some of the cached links to my sites - the cache turns out to be so old, it dates from before I added adsense.
Too many websites return "200 OK" or "302 moved" when trying to access a URL that should not return any content. Google may be trying to overcome those that fail to send "404" in the HTTP header.
There is no other path to those pages other than throught the entry page, plus the site: tool has to register the entry page to produce a result.
So my conclusion is that reports of the site: tool being fixed are still very dubious, or at least it's not showing what it should.
What this means as a indication of general reliabiliy of indexing and results - i don't know.
Whether the "entry page" is indexed is another issue, is the "entry page" in the path oursite.com/sitemap/?
fyi - we now have 425 pages , up from yesterday, but short of 9,500 pages & the site:oursite.com.au/sitemap/ page is also showing
The cache date of the above is 09Jun06! Where was it yesterday?
It's not just the trailing dash that was broke - i think there's a lot more going on.