| 9:03 am on Aug 6, 2006 (gmt 0)|
whitenight, I don't expect any, although I'll certainly keep my eyes open. hvacdirect, it's possibly it will be 3/06 and more recent (mostly more recent).
And just to be clear, our main web index is usually updated daily, and is typically refreshed completely every 2-3 weeks.
[edited by: GoogleGuy at 9:06 am (utc) on Aug. 6, 2006]
| 9:12 am on Aug 6, 2006 (gmt 0)|
"I am with Steveb - I will send flowers to Google"
In case you and steveb need an address to send those flowers to :-)
Google Sitemaps Team
Central Way Plaza
720 4th Avenue
Kirkland, WA 98033
[edited by: reseller at 9:15 am (utc) on Aug. 6, 2006]
| 11:34 am on Aug 6, 2006 (gmt 0)|
That would be outstanding to get rid of those stale sups. I imagine I would send flowers too.
| 1:09 pm on Aug 6, 2006 (gmt 0)|
I bet they would like pizza better.
| 1:38 pm on Aug 6, 2006 (gmt 0)|
|What kind of kinks did you see? |
(Sorry, had to fly to San Jose before I could answer this.)
Showing a new, second set of websites after specifying preferred domains is confusing, but I'm sure it does this for a reason and I suppose it could be called a feature and not a bug, but I have no idea what the benefit would be. How many people have 'www.domain.com' set up as a different web site than 'domain.com'?
The other issue I've seen appears when you have a site on a third-level domain (sub-domain). I have a site which is separated into two sections, of of them using a sub-domain to specify one of the sections, like Google does with 'dir.google.com'. This points to a different set of servers, mainly for load-balancing considerations, and has it's own set of sitemaps.
In the new sitemaps interface, if I select that site and select 'dir.domain.com' as the preferred domain, I get an verification error stating that there was an error in adding 'www.dir.domain.com'. Of course there was an error, there is no NS record set up for 'www.dir.domain.com', and their never will be, but what does this have to do with anything since I am trying to specify 'dir.domain.com' as the preferred domain name? Incidentally there doesn't appear to be an NS record set up for 'www.dir.google.com' either so I would assume that if someone at Google wanted to specify 'dir.google.com' for their preferred domain they would get the same error message.
The end result is that when I try to select 'dir.domain.com' as the preferred domain I get an error message saying that they could not verify 'www.dir.domain.com' which is jarring because I didn't select 'www.dir.domain.com'. Of course I could be missing something and this could be a feature and not a bug, but from where I sit it sure looks like a bug.
Don't get me wrong, this is definitely another step in the right direction and I applaud the engineers for implementing this, but these issues along with the dead "Crawl Rate" link make me think this isn't quite finished yet.
| 4:06 pm on Aug 6, 2006 (gmt 0)|
Googleguy, thank you very much for your very personal reply; however I'm still in a similar confused state as dataguy as to what to do with the newly proposed non-www-account. Delete it? submit the same sitemap with all the www-urls for a second time?
As for the crawl-rate option, let me state my very personal expectations for a start: I have a few very well-indexed evergreen-pages, highly appreciated/visited by my customers. These hardly ever change. In addition to that there are a number of shop-category-pages, which change from time to time due to price-changes or new/de-listings of products. So all of my pages need not be crawled in a regular timespan. Whenever I perform broader changes (adding new pages) I resubmit the sitemap and in most cases it is crawled within 24 hours to my very satisfaction. Changes in my products-database, which will result in changes in the product-category-pages are covered by the date-last-modified-tag in the sitemap. (I use a self-written script to generate that sitemap). So for me personally, there is no need to define a specific crawl-rate. I'm quite fine if resubmitting the sitemap is accepted within 48 hours and new pages get indexed within a week, as I experienced in the past months. Thank you for that.
I think this is different for pages that change every few minutes like news-sites or sites like webmasterworld, but I also think these play a special role in the crawling-procedures anyway, because google already knows about their importance. So cui bono? All I can see is webmasters trying to artificially push their website's importance by submitting a high crawl-rate plus adding some rss-news-feeds to their poor content. To my opinion the date-last-modified-tag perfectly suffices, as long as it is well chosen by us webmasters. Please enlighten my frog-perspective.
| 4:12 pm on Aug 6, 2006 (gmt 0)|
Hmmm..very interesting but I deleted my sitemap two days ago. My reason was that the bot wasn't using it.
On this particular 32 page site, only the index page was non-supplemental. There were ten pages completely missing. There were 8 pages that haven't existed for a year. Those 8 I deleted from google 8 months ago. They have 404'd since then. But google brought them back.
So, what was the point? I figure if I delete it, maybe google will crawl it on it's on. In the sitemap, you are supposed to tell google which pages are most important. The ten missing pages were ranked second only to the index page.
If the bot is going to take none of this into account, then what is the purpose of a sitemap?
| 4:46 pm on Aug 6, 2006 (gmt 0)|
The preferred domain option is limited to some domain and its www variant. So you can tell it you prefer www.domain.com over domain.com, or foo.domain.com over www.foo.domain.com. If either doesn't exist or isn't properly configured, you encounter a somewhat confusing error message saying they couldn't validate the domain.
IMHO, a better implementation would have been to allow people to define a group of mirrored servers and specify which is preferred. For example:
[domain.com...] <- Preferred
Unless I don't understand the purpose of the crawl speed setting they're working on, I'm not sure why they just don't implement it like everyone else and recognize a 'Craw-Delay' setting in robots.txt.
| 4:51 pm on Aug 6, 2006 (gmt 0)|
You may wish to just leave the new account (non-www) under "site" as it is. However, it should show that its verified under "Site Verified?".
"you need to prove that you control both versions (www and non-www)", as GoogleGuy mentioned in his reply to you.
No need to add a sitemap for the non-www.
| 6:02 pm on Aug 6, 2006 (gmt 0)|
texasville, did you read what quadrille initiated under
Perhaps it was just your sitemaps generator.
| 8:49 pm on Aug 6, 2006 (gmt 0)|
what reseller said. Think of it like Google Webmaster Central now has this tool to prove that you control domain.com and www.domain.com. Once you prove you control both, you can set your preferred domain.
Sitemaps is now just a part of Google Webmaster Central, and you shouldn't need to make any changes to your sitemap data.
| 9:25 pm on Aug 6, 2006 (gmt 0)|
Is it advisable to re submit sitemaps everytime you update your website?
Including where there is no change to file names or addition of new files.?
I ask cos sitemaps are supposed to be a mechanism to inform search engines that the site has been updated,
| 9:50 pm on Aug 6, 2006 (gmt 0)|
|prove that you control domain.com and www.domain.com. Once you prove you control both, you can set your preferred domain |
Houston, we have a problem.
a.) Of the millions of websites, how many of them have different pages for www.domain.com and domain.com?!
b.) Assuming the answer to the above is none, why haven't G engineers figured this out yet?
c.) Outside of the rather biased subset of WebmasterWorld members, how mahy webmasters think of "www." as a subdomain of their domain.com?
d.) Are you admitting that having www.domain.com and domain.com not redirected in the .htaccess causes problems for sites?
| 9:51 pm on Aug 6, 2006 (gmt 0)|
There is a green hook under 'website verified', so I'll leave it as it is. thank you.
Tonight I saw two dropdown-boxes meant to narrow down the crawl-timespan on the error-reports. I think this is also new, but I don't quite get, what this is aiming at. As I mentioned earlier, it is very hard for me to correct these mistakes without knowing where the link came from.
To elaborate an idea put up by KenB above: What about a sort of <ignore url>tag</ignore url> in sitemaps, designed for broken links from other websites? Might save a lot of crawl-bandwidth and help to get clean sitemaps.
| 10:02 pm on Aug 6, 2006 (gmt 0)|
There are plenty of sites that have pages/versions under both www.domain.com and domain.com. Whether due to failing to think about the distinction (treating them as if they wee purely synonymous and irrelevant) or for other reasons (accidents) google has struggled to figure out which version is the correct ("canonical" version.
This new tool provides a simple method for the webmaster to tell google what their intent was.
Nice feature, and a very encouraging step in the right direction.
| 10:09 pm on Aug 6, 2006 (gmt 0)|
Whitenight, you're onto something, particularly with c, but who cares about d? I guess the whole thing has to do with some filters concerning subdomain-spamming; yes, www is only one possible subdomain among many, many others. The tool is meant as a vehicle for white-hats to make sure their websites be excluded from those filters. We help google finetuning these filters and receive interesting statistical data about our websites without paying for that information. It's a fair deal and the serps and searchers can only benefit.
| 10:31 pm on Aug 6, 2006 (gmt 0)|
I really like the new diagnostic pages and additional options. Especially the fact that the error distribution bargraph actually reflects what's in the error lists now. That was one of my main complaints before. :)
| 10:37 pm on Aug 6, 2006 (gmt 0)|
Angonasec, what browser are you using? I'll check into why the tools doesn't expand for you when you click the +.
vite_rts, if you want us to know about new pages on your site right away, it's a good idea to update your Sitemap with those new pages. You can either then wait for us to download the Sitemap file again (which we do periodically automatically) or you can resubmit the Sitemap (by clicking the resbumit button from the Sitemaps tab) or ping us.
Regarding the preferred domain feature, as GoogleGuy mentioned, we first ensure you own both the www and non-www version of a domain before we set the preference. In most cases, these are the same, but it's not the case 100% of the time. We add the non-preferred version to your account, (we'll work on making why that is more clear in the interface). (Note that adding a URL to your account doesn't submit that URL for crawling, it just allows you to see information about it.) You don't want to add a Sitemap for the non-preferred version, although it may be interesting to look at stats that may exist for that version.
The last section of this page contains general information about the feature:
| 10:45 pm on Aug 6, 2006 (gmt 0)|
|Whitenight, you're onto something, particularly with c, but who cares about d? |
Well I think many webmasters would find such an admission useful.
In fact, most of the webopshere should know this, if indeed, it's true.
As it would(could) account for much of the "problems" people have been having with their sites
I'm not sure how many people are "spamming" the index using their www. as a subdomain with different pages than their domain.com
My guess is none.
And no offense to Google, but naturally putting "www." as the official domain name was around far before G was a sparkle in Brin's and Page's eyes.
Don't you think G should conform to "standard practices" rather than everyday webmasters who don't visit sites like this, figuring out that their site has been penalized because G can't code properly?
| 10:49 pm on Aug 6, 2006 (gmt 0)|
|we first ensure you own both the www and non-www version of a domain before we set the preference. In most cases, these are the same, but it's not the case 100% of the time. |
Umm ok. So don't you think this is a pretty big announcement considering how most webhost/servers are set up?
Shouldn't G be posting this type of information everywhere it can, considering the potential consequences it might have on a site?
| 11:17 pm on Aug 6, 2006 (gmt 0)|
Ok just to make a quick example.
are two different domains...with different PR, etc.
If G's algo's work like I think they do. Links from "www.nytimes.com" and "nytimes.com" are counted twice and with 2 different PRs.
nytimes.com is a complete "duplicate" content as www.nytimes.com.
Geez, don't G filters use dupe content as a sign of "spam", etc. etc.
Am I being slow or is no one seeing the consequences of this?!
| 11:35 pm on Aug 6, 2006 (gmt 0)|
They are duplicate content, and a 301 redirect from non-www to www is the fix for this problem as has been discussed in every webmaster forum time and time gain going back 3 years or more......
| 11:41 pm on Aug 6, 2006 (gmt 0)|
lol it shouldn't need a "fix"...
If the webmaster of nytimes doesn't know about it, what are the chances other webmasters (who don't waste their time ;) reading these boards) would?
And again, the bigger point is missed.
think using www.domain.com and domain.com for blackhat,
think dupe content, dupe links.
Like I said, if indeed G is admitting their inablility to code this properly, that should be announced on MC blog, WebWorld, Sitemaps, GoogleGroups and by every G employee til the end of all time, because nobody outside of a few forums knows about it!
That's infinitely more important to the ENTIRE web community than some bells and whistles on the sitemap panel....
| 11:54 pm on Aug 6, 2006 (gmt 0)|
You have the ability to show content at any URL that you want to.
Duplicate content occurs when that is page11.html versus page75.html
or domain.com versus domain.org
or domain.com versus www.domain.com
All of these will catch you out.
Think of this in reverse. If Google started making assumptions that X and Y are really the same site, then you'll get the same problems that occurred with the 302 URL hijack problem, where your content appears in the SERPs with the URL of another site: a site that redirects to you and is therefore associated with you, and replaces you in the SERPs.
We do not want to go that route again.
| 12:07 am on Aug 7, 2006 (gmt 0)|
Aside from being annoyed that the entire web community needs to change to fit G's coding issues...
they need to address this publicly, either way.
As hosting servers need to have options to 'automatically' put .htaccess 301s in effect.
Whitehatters need to know that a competitor (or innocent non-professional) can destory their rankings with a simple link to the "non-preferred" domain
PR 7 outbound links on non-301'ed sites will also have another outbound link on the PR 4-5 version of the page on the "other" domain
one can create "split-testing" with the www. version and non-www version (use your imagination)
| 12:57 am on Aug 7, 2006 (gmt 0)|
I pulled all my sites from google sitemaps today....I just don't see any benefit besides giving google more information then they need. Regardless of what is said about sitemps having no influence on rankings, etc.. I own too many sites, not to see patterns. I've also worked in enough large corporation to know that the google people who post on these boards probably actually believe in what they are saying, because that is what they are told...but the reality could very well be, that there somewhere somehow, google sitemaps feeds data into the algorithm which alters ranking or forces results supplemental.
If your site is well desgined from head to toe, there is no need for sitemaps and that's the truth.
| 2:40 am on Aug 7, 2006 (gmt 0)|
" They are duplicate content, and a 301 redirect from non-www to www is the fix for this problem as has been discussed in every webmaster forum time and time gain going back 3 years or more...... "
Problem is these redirects don't always resolve the problem very fast. I've seen some sites penalized up to 9 months when trying to fix a www vs. non-www issue with many of the old domain pages still in cache up to 3 years later never being updated.
The fact that this is an option now in the sitemaps shows the problem is present.
So even though the "fix" is present it can take an eternity for some sites while other's only take weeks with the same exact redirect in place. This is esepcially present with a refresh of the supplemental index as the old domain takes presedence over the newer domain. Now since GG said the updated of the supplemetal indesx should only have a cahce date of 2-3 months the problem coul easily be fixed. We just sit and wait now.
| 3:21 am on Aug 7, 2006 (gmt 0)|
|As hosting servers need to have options to 'automatically' put .htaccess 301s in effect. |
I think one problem is that, in theory (or maybe to a search engineer with a very literal mind), www.mysite.com and mysite.com aren't necessarily the same site. They can be, but we shouldn't assume they are.
To put it another way, just because 99.99% of the world's Webmasters would assume that "www.mysite.com" and "mysite.com" are the same site doesn't necessarily make it so. (At least, that's more or less what one very literal-minded technical guy told me--and it might explain why Google is reluctant to make assumptions about www.mysite.com, mysite.com, and the other possible variations on My Site's URL representing the same site or page.)
Problem two is that, from the hosting service's point of view, there's no difference between www.mysite.com and mysite.com. If someone types in either variant, the server will dish up the same home page (unless you've gone out of your way to make things happen differently), so why should the hosting service or the default server software bother with redirects? (Sure, we know why, but hosting services and people who write server software may feel--with some justification--that it isn't their problem to solve.)
Don't misunderstand me: I think what you've suggested is a great idea, though I'd go one step further and make redirects of everything to mysite.com/ or www.mysite.com/ (depending on the Webmaster's preference) occur by default.
ADDENDUM: I lost 70-90% of my Google referrals between late March and late May of 2005, presumably because of the www vs. non-www duplication problem. I fixed my .htaccess file a week or two after the initial disaster (thanks to tips from lammert and dazzlindona), and my site recovered in Google within six or seven weeks. So I'm a firm believer in those 301 redirects to www.mysite.com/ or mysite.com/, as the site owner or Webmaster prefers.
| 6:03 am on Aug 7, 2006 (gmt 0)|
Good morning Folks
Always Look On the Bright Side of Life :-)
These days are great days in the life of webmasters communities. How many of you have ever dreamed that the day will come when Google and the Googlers provide us with a Google Webmaster Central and solutions of problems such as supplementals and canonicals that we have been discussing for years and asking GOOG to help resolving them!
Furthermore.. we see our good friends the Googlers spending more and more of their times answering our questions. And they all have been contributing here on forum 30; GoogleGuy, Matt , Vanessa and Adam!
Lets keep those positive creative thoughts coming. Life is too short for anything else :-)
| 6:41 am on Aug 7, 2006 (gmt 0)|
No..the sitemap was correct and not a problem. It contained all the correct information. It just was basically ignored by google. So why have it?
| 8:16 am on Aug 7, 2006 (gmt 0)|
There is also a uk version
should uk sites by manged here or does it make no difference?
| This 167 message thread spans 6 pages: < < 167 ( 1  3 4 5 6 ) > > |