Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google - latest on site: queries fix

Google Site: Query fix

         

Whitey

11:28 am on Jun 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thought you's all like to be aware of these details on the latest fix with the site: queries.

An update on the site: operator
6/02/2006 07:28:00 PM

Posted by Vanessa Fox, Google Engineering

We've fixed the issue with site: queries for domains with punctuation in them. We are still working on site: operator queries for domains that include a trailing slash at the end (such as site:www.example.com/ ), so you may get better results for now by omitting the trailing slash in your queries. The Index Stats page of Google Sitemaps no longer uses the trailing slash for its queries, so you should see correct results when using this page.

Thanks for your feedback and patience.

[sitemaps.blogspot.com...]

arubicus

10:19 pm on Jun 7, 2006 (gmt 0)

10+ Year Member



That's right, and that's what I did, except I linked the sitemap from the main menu so it's accessible from every page. The visitors seem to like it too So with one click, they and bots can see a link to every article on the site.

I did ours in the footer since the site is pretty much self explanitory.

Got 15 site maps on the home page. Rather I call them tables of contents. These tables of contents are the contents of the 15 main categories created in an outline manner. In each main category I only link to the table of contents that outlines just that category to stay on theme. Those are placed in the footer of each page that resides in that category. So there is good coverage. Now the cool part is that when googlebot hits our home page it almost immediately hits those maps FIRST. Why I have no idea but then it goes crazy.

Without the on-site sitemap Google bot would just crawl hear and there in no particular order. Hitting one section the across to another and so on. Never did it really drill down into the site. With the site map it consumes huge chunks of content in each category. Pretty much the whole thing right down the line.

Whitey

6:46 am on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You might want to look at this thread I started because I believe there's a potential tie in with sitemaps not being spidered properly.

I distrust the accuracy of the site: command to identify this, but if it isn't working, per Vanessa Fox's information release, is their a corrolation to it and general indexing problems?

Have a look at this
[webmasterworld.com...]

malachite

9:48 am on Jun 8, 2006 (gmt 0)

10+ Year Member



...I believe there's a potential tie in with sitemaps not being spidered properly.

I can only comment on what's happened with my sites, but since pages started disappearing, I've delved deep into various forums on here that I wouldn't normally frequent, trying to find out what's happened and whether there's anything I can try to bring 'em back.

Should add that 1. these are old, well established sites, and 2. I'm seeing a lot of the same things happening now as happened after Florida.

Essentially, prior to this "update" or whatever it was, all pages were indexed and showed up on a site: search. Post catastrophe, the only pages which showed up were those which did appear on the site map I had, which wasn't a very good one.

So in my case, I'm wondering if the old site map (which went down only to level 2) worked too well? Did Google think that's all there was?

All I know is Googlebot has once again spidered right down to level 4 since I added the new sitemap, and that I'm now seeing (as of this morning) all pages except those most recently added showing in the site: search results on datacentre 66.102.9.104, which is the one that was only showing a few pages a couple of days ago.

trinorthlighting

4:14 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Whitey,

I do have the feeling that the / is effecting indexing. If its effecting the site: command it has to be effecting other areas as well. I suspect that is why google has gone quiet for the most part.

Dayo_UK

4:15 pm on Jun 8, 2006 (gmt 0)



Doing site:domain.com/ searches look like they are back to normal now.

EG. Showing supplementals again.

Not true of allinurl:domain.com searches - but Google have not said anything about that anyway.

BillyS

4:35 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



trinorthlighting -

I hope your right about ranking, but I have my doubts it's going to provide a significant boost. It only make sense that a problem in one section has an affect elsewhere, but I'm not sure how large it will be.

I've got the double whammy problem. Hypenated domain and a redirect to a trailing slash. After dropping a lot of pages, I'm back up to 920 of 1,200. Maybe this is as high as it's going to get. But what concerns me is that I've added like 50 pages in the last 2 months and the pages indexed seems stuck at 920.

F_Rose

5:39 pm on Jun 8, 2006 (gmt 0)

10+ Year Member



BillyS,

You are still considered lucky..

We are stuck with 24 pages only..And it's just not moving..

The same results ever since the major drop in April..

tiori

10:12 pm on Jun 8, 2006 (gmt 0)

10+ Year Member



Its not just the site command with hyphenated domains and trailing / that is screwed up.

I believe that many sites that did the non www to www redirects as instructed by Matt and many others have fallen into a "google trap". Many of the redirects were done to www.domain.com/ and not www.domain.com

Seems strange, but I think that the trailing / has something to do with it...... I know it shouldn't make any difference, but Google has been doing some strange things lately!

(don't everyone attack me....it's just my opinion; I don't have any hard evidence)

F_Rose

10:26 pm on Jun 8, 2006 (gmt 0)

10+ Year Member



arubicus,

We have links to our main pages from our home page.

However, Google is not listing these pages in thier index..

I don't have any pattern for the pages that are listed to even know why Google has decided to list these specific pages out of other pages..

trinorthlighting

10:27 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has been doing some very strange stuff. Funny how they are starting to recrawl the supplemental index yet the / is still not fixed.

One of my sites was hit hard with dropped links an pages.

trinorthlighting

10:33 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You know, I have been reasearching a lot of adsense sites. Every single adsense site is at least cached in googles index. Thoughts?

texasville

12:22 am on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>I believe that many sites that did the non www to www redirects as instructed by Matt and many others have fallen into a "google trap". Many of the redirects were done to www.domain.com/ and not www.domain.com <<<
It appears mine is that way.

F_Rose

1:20 am on Jun 9, 2006 (gmt 0)

10+ Year Member



done to www.domain.com/

My site is done that way too..

Could someone please explain what the problem may be..

longen

2:51 am on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From GoogleGuy:
I would always recommend the trailing slash

See msg's 5 & 11
[webmasterworld.com ]

dmje

4:22 am on Jun 9, 2006 (gmt 0)

10+ Year Member



Well then I will chime in and say that mine could be wrong also...here is what I have:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com [NC]
RewriteRule (.*) [mydomain.com...] [R=301,L]

When I use a server header checker it appears to be redirecting fine. Here is what the server header check shows:

#1 Server Response: [mydomain.com...]
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Server: Apache/1.3.36 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a
Location: [mydomain.com...]
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: [mydomain.com...]

#2 Server Response: [mydomain.com...]
HTTP Status Code: HTTP/1.1 200 OK
Date: Fri, 09 Jun 2006 04:09:51 GMT
Server: Apache/1.3.36 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a

Anyone have any other thought on whether this is now the right or wrong way to do the 301?

g1smd

6:50 pm on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looks fine to me.

malachite

10:27 pm on Jun 9, 2006 (gmt 0)

10+ Year Member



but Google has been doing some strange things lately!

It certainly has; like showing in my logs as having spidered pages which don't exist, some of which have never existed, with some really weird URLs I've never had. Example: mysite.com/dfridkrnspv.html. None of my URLs are random letters, they're all like "www.mysite.com/blue-widgets/"

Every single adsense site is at least cached in googles index. Thoughts?

Interesting. That answers a question I'd wondered about. Presumably Google wouldn't shoot itself in both feet by entirely removing sites that earn it money, although it's having a damned good try. :(

Were the cached sites you looked at displaying adsense on their pages? I ask, because after more digging - including actually clicking some of the cached links to my sites - the cache turns out to be so old, it dates from before I added adsense.

g1smd

10:40 pm on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The "random letter URL pages" may be a crude attempt to discover what your "404 page" actually "looks" like in order to filter out any other URLs that return the same, or similar, content.

Too many websites return "200 OK" or "302 moved" when trying to access a URL that should not return any content. Google may be trying to overcome those that fail to send "404" in the HTTP header.

Whitey

5:50 am on Jun 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just did a search using site:oursite.com/sitemap/ and received a response showing 9 results , but not the entry page.

There is no other path to those pages other than throught the entry page, plus the site: tool has to register the entry page to produce a result.

So my conclusion is that reports of the site: tool being fixed are still very dubious, or at least it's not showing what it should.

What this means as a indication of general reliabiliy of indexing and results - i don't know.

daveVk

6:48 am on Jun 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does site:oursite.com/sitemap/ return different results from site:oursite.com/sitemap , I think this is what the problem is percieved be. Seems to me all they need to do is ignore trailing slashes, what other meaning has it in this context? Yes I know its meaning as a url but in this context?

Whether the "entry page" is indexed is another issue, is the "entry page" in the path oursite.com/sitemap/?

Whitey

11:42 pm on Jun 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



daveVk - with and without the trailing dash i get the same.

fyi - we now have 425 pages , up from yesterday, but short of 9,500 pages & the site:oursite.com.au/sitemap/ page is also showing

The cache date of the above is 09Jun06! Where was it yesterday?

It's not just the trailing dash that was broke - i think there's a lot more going on.

This 51 message thread spans 2 pages: 51