Welcome to WebmasterWorld Guest from 220.127.116.11
And just to be clear, our main web index is usually updated daily, and is typically refreshed completely every 2-3 weeks.
[edited by: GoogleGuy at 9:06 am (utc) on Aug. 6, 2006]
"I am with Steveb - I will send flowers to Google"
In case you and steveb need an address to send those flowers to :-)
Google Sitemaps Team
Central Way Plaza
720 4th Avenue
Kirkland, WA 98033
[edited by: reseller at 9:15 am (utc) on Aug. 6, 2006]
What kind of kinks did you see?
(Sorry, had to fly to San Jose before I could answer this.)
Showing a new, second set of websites after specifying preferred domains is confusing, but I'm sure it does this for a reason and I suppose it could be called a feature and not a bug, but I have no idea what the benefit would be. How many people have 'www.domain.com' set up as a different web site than 'domain.com'?
The other issue I've seen appears when you have a site on a third-level domain (sub-domain). I have a site which is separated into two sections, of of them using a sub-domain to specify one of the sections, like Google does with 'dir.google.com'. This points to a different set of servers, mainly for load-balancing considerations, and has it's own set of sitemaps.
In the new sitemaps interface, if I select that site and select 'dir.domain.com' as the preferred domain, I get an verification error stating that there was an error in adding 'www.dir.domain.com'. Of course there was an error, there is no NS record set up for 'www.dir.domain.com', and their never will be, but what does this have to do with anything since I am trying to specify 'dir.domain.com' as the preferred domain name? Incidentally there doesn't appear to be an NS record set up for 'www.dir.google.com' either so I would assume that if someone at Google wanted to specify 'dir.google.com' for their preferred domain they would get the same error message.
The end result is that when I try to select 'dir.domain.com' as the preferred domain I get an error message saying that they could not verify 'www.dir.domain.com' which is jarring because I didn't select 'www.dir.domain.com'. Of course I could be missing something and this could be a feature and not a bug, but from where I sit it sure looks like a bug.
Don't get me wrong, this is definitely another step in the right direction and I applaud the engineers for implementing this, but these issues along with the dead "Crawl Rate" link make me think this isn't quite finished yet.
As for the crawl-rate option, let me state my very personal expectations for a start: I have a few very well-indexed evergreen-pages, highly appreciated/visited by my customers. These hardly ever change. In addition to that there are a number of shop-category-pages, which change from time to time due to price-changes or new/de-listings of products. So all of my pages need not be crawled in a regular timespan. Whenever I perform broader changes (adding new pages) I resubmit the sitemap and in most cases it is crawled within 24 hours to my very satisfaction. Changes in my products-database, which will result in changes in the product-category-pages are covered by the date-last-modified-tag in the sitemap. (I use a self-written script to generate that sitemap). So for me personally, there is no need to define a specific crawl-rate. I'm quite fine if resubmitting the sitemap is accepted within 48 hours and new pages get indexed within a week, as I experienced in the past months. Thank you for that.
I think this is different for pages that change every few minutes like news-sites or sites like webmasterworld, but I also think these play a special role in the crawling-procedures anyway, because google already knows about their importance. So cui bono? All I can see is webmasters trying to artificially push their website's importance by submitting a high crawl-rate plus adding some rss-news-feeds to their poor content. To my opinion the date-last-modified-tag perfectly suffices, as long as it is well chosen by us webmasters. Please enlighten my frog-perspective.
IMHO, a better implementation would have been to allow people to define a group of mirrored servers and specify which is preferred. For example:
Unless I don't understand the purpose of the crawl speed setting they're working on, I'm not sure why they just don't implement it like everyone else and recognize a 'Craw-Delay' setting in robots.txt.
You may wish to just leave the new account (non-www) under "site" as it is. However, it should show that its verified under "Site Verified?".
"you need to prove that you control both versions (www and non-www)", as GoogleGuy mentioned in his reply to you.
No need to add a sitemap for the non-www.
Sitemaps is now just a part of Google Webmaster Central, and you shouldn't need to make any changes to your sitemap data.
Including where there is no change to file names or addition of new files.?
I ask cos sitemaps are supposed to be a mechanism to inform search engines that the site has been updated,
prove that you control domain.com and www.domain.com. Once you prove you control both, you can set your preferred domain
Houston, we have a problem.
a.) Of the millions of websites, how many of them have different pages for www.domain.com and domain.com?!
b.) Assuming the answer to the above is none, why haven't G engineers figured this out yet?
c.) Outside of the rather biased subset of WebmasterWorld members, how mahy webmasters think of "www." as a subdomain of their domain.com?
d.) Are you admitting that having www.domain.com and domain.com not redirected in the .htaccess causes problems for sites?
Tonight I saw two dropdown-boxes meant to narrow down the crawl-timespan on the error-reports. I think this is also new, but I don't quite get, what this is aiming at. As I mentioned earlier, it is very hard for me to correct these mistakes without knowing where the link came from.
To elaborate an idea put up by KenB above: What about a sort of <ignore url>tag</ignore url> in sitemaps, designed for broken links from other websites? Might save a lot of crawl-bandwidth and help to get clean sitemaps.
This new tool provides a simple method for the webmaster to tell google what their intent was.
Nice feature, and a very encouraging step in the right direction.
vite_rts, if you want us to know about new pages on your site right away, it's a good idea to update your Sitemap with those new pages. You can either then wait for us to download the Sitemap file again (which we do periodically automatically) or you can resubmit the Sitemap (by clicking the resbumit button from the Sitemaps tab) or ping us.
Regarding the preferred domain feature, as GoogleGuy mentioned, we first ensure you own both the www and non-www version of a domain before we set the preference. In most cases, these are the same, but it's not the case 100% of the time. We add the non-preferred version to your account, (we'll work on making why that is more clear in the interface). (Note that adding a URL to your account doesn't submit that URL for crawling, it just allows you to see information about it.) You don't want to add a Sitemap for the non-preferred version, although it may be interesting to look at stats that may exist for that version.
The last section of this page contains general information about the feature:
Whitenight, you're onto something, particularly with c, but who cares about d?
Well I think many webmasters would find such an admission useful.
In fact, most of the webopshere should know this, if indeed, it's true.
As it would(could) account for much of the "problems" people have been having with their sites
I'm not sure how many people are "spamming" the index using their www. as a subdomain with different pages than their domain.com
My guess is none.
And no offense to Google, but naturally putting "www." as the official domain name was around far before G was a sparkle in Brin's and Page's eyes.
Don't you think G should conform to "standard practices" rather than everyday webmasters who don't visit sites like this, figuring out that their site has been penalized because G can't code properly?
we first ensure you own both the www and non-www version of a domain before we set the preference. In most cases, these are the same, but it's not the case 100% of the time.
Umm ok. So don't you think this is a pretty big announcement considering how most webhost/servers are set up?
Shouldn't G be posting this type of information everywhere it can, considering the potential consequences it might have on a site?
are two different domains...with different PR, etc.
If G's algo's work like I think they do. Links from "www.nytimes.com" and "nytimes.com" are counted twice and with 2 different PRs.
nytimes.com is a complete "duplicate" content as www.nytimes.com.
Geez, don't G filters use dupe content as a sign of "spam", etc. etc.
Am I being slow or is no one seeing the consequences of this?!
If the webmaster of nytimes doesn't know about it, what are the chances other webmasters (who don't waste their time ;) reading these boards) would?
And again, the bigger point is missed.
think using www.domain.com and domain.com for blackhat,
think dupe content, dupe links.
Like I said, if indeed G is admitting their inablility to code this properly, that should be announced on MC blog, WebWorld, Sitemaps, GoogleGroups and by every G employee til the end of all time, because nobody outside of a few forums knows about it!
That's infinitely more important to the ENTIRE web community than some bells and whistles on the sitemap panel....
Duplicate content occurs when that is page11.html versus page75.html
or domain.com versus domain.org
or domain.com versus www.domain.com
All of these will catch you out.
Think of this in reverse. If Google started making assumptions that X and Y are really the same site, then you'll get the same problems that occurred with the 302 URL hijack problem, where your content appears in the SERPs with the URL of another site: a site that redirects to you and is therefore associated with you, and replaces you in the SERPs.
We do not want to go that route again.
As hosting servers need to have options to 'automatically' put .htaccess 301s in effect.
Whitehatters need to know that a competitor (or innocent non-professional) can destory their rankings with a simple link to the "non-preferred" domain
PR 7 outbound links on non-301'ed sites will also have another outbound link on the PR 4-5 version of the page on the "other" domain
one can create "split-testing" with the www. version and non-www version (use your imagination)
If your site is well desgined from head to toe, there is no need for sitemaps and that's the truth.
Problem is these redirects don't always resolve the problem very fast. I've seen some sites penalized up to 9 months when trying to fix a www vs. non-www issue with many of the old domain pages still in cache up to 3 years later never being updated.
The fact that this is an option now in the sitemaps shows the problem is present.
So even though the "fix" is present it can take an eternity for some sites while other's only take weeks with the same exact redirect in place. This is esepcially present with a refresh of the supplemental index as the old domain takes presedence over the newer domain. Now since GG said the updated of the supplemetal indesx should only have a cahce date of 2-3 months the problem coul easily be fixed. We just sit and wait now.
joined:Oct 27, 2001
As hosting servers need to have options to 'automatically' put .htaccess 301s in effect.
I think one problem is that, in theory (or maybe to a search engineer with a very literal mind), www.mysite.com and mysite.com aren't necessarily the same site. They can be, but we shouldn't assume they are.
To put it another way, just because 99.99% of the world's Webmasters would assume that "www.mysite.com" and "mysite.com" are the same site doesn't necessarily make it so. (At least, that's more or less what one very literal-minded technical guy told me--and it might explain why Google is reluctant to make assumptions about www.mysite.com, mysite.com, and the other possible variations on My Site's URL representing the same site or page.)
Problem two is that, from the hosting service's point of view, there's no difference between www.mysite.com and mysite.com. If someone types in either variant, the server will dish up the same home page (unless you've gone out of your way to make things happen differently), so why should the hosting service or the default server software bother with redirects? (Sure, we know why, but hosting services and people who write server software may feel--with some justification--that it isn't their problem to solve.)
Don't misunderstand me: I think what you've suggested is a great idea, though I'd go one step further and make redirects of everything to mysite.com/ or www.mysite.com/ (depending on the Webmaster's preference) occur by default.
ADDENDUM: I lost 70-90% of my Google referrals between late March and late May of 2005, presumably because of the www vs. non-www duplication problem. I fixed my .htaccess file a week or two after the initial disaster (thanks to tips from lammert and dazzlindona), and my site recovered in Google within six or seven weeks. So I'm a firm believer in those 301 redirects to www.mysite.com/ or mysite.com/, as the site owner or Webmaster prefers.
Always Look On the Bright Side of Life :-)
These days are great days in the life of webmasters communities. How many of you have ever dreamed that the day will come when Google and the Googlers provide us with a Google Webmaster Central and solutions of problems such as supplementals and canonicals that we have been discussing for years and asking GOOG to help resolving them!
Furthermore.. we see our good friends the Googlers spending more and more of their times answering our questions. And they all have been contributing here on forum 30; GoogleGuy, Matt , Vanessa and Adam!
Lets keep those positive creative thoughts coming. Life is too short for anything else :-)