Welcome to WebmasterWorld Guest from 184.108.40.206
Intresting example with the nytimes - it is hard to find examples which are OK to discuss at WebmasterWorld but NYtimes should be fine.
In a way it is intresting that they have not already been penalized like the majority of sites that have had this issue.
I guess a PR10 on the www and a PR7 on the non-www are enough to overcome a lot of penalties.
Lots of webmasters have been crying out for a fix for this issue for ages - whether this sitemaps preferences results in improvements to the ranking of sites effected by the bug is the real question in my opinion - if it is succesfull then perhaps it would be correct for a general announcement by Google that using the webmaster central is the way to deal with any issues.
Granted that has always been out there, but being able to go to one place on google and a couple of clicks is pretty cool.
I'm using FF 220.127.116.11 (The latest) on Mac OSX 10.4.7 also the latest.
It is a browser compatability bug, because I do see the drop down menu when I use Safari.
If only I could get a helpful reply to my emails asking why our NFP site is being punished by the algo...
As far as I can tell, seeing old non-www URLs in the SERPs as Supplemental Results isn't usually a problem in and of itself. Google holds on to those for years. That is to be expected; but GoogleGuy has just confirmed that are going to be updated to more recently spidered data soon.
The correct measure as to whether the problem is fixed, is to see how many www pages get listed. Every time that I have added a 301 redirect to a site that has never had one before, the number of www pages listed has rapidly increased, the URL-only www listings have turned into full listings, and the number of www Supplemental Results has rapidly decreased.
The non-www URL-only results have declined in number too. The amount of non-www supplemental results has varied, sometimes staying static or sometimes falling, but often increasing by a small amount a month or two after the redirect was first implemented.
Where those non-www Supplemental Results appeared in the SERPs, the redirect on the site still manages to deliver the visitor to the correct www version of the page.
For most sites without the redirect in place, I almost always also found a poor usage of the <title> tag, the same meta description in use on multiple pages, and an incorrect link back to the root index page: always use the "http://www.domain.com/" format, always omitting the index.html index file filename from the link itself.
Clearing up all of those issues, has always helped a site get over the listings problems. Xenu LinkSleuth has often been a great help here too.
Finally, remember that when you link to a folder to always include a trailing / on the URL. A link to just /folder on a page at www.domain.com could see the link pointing to www.domain.com/folder which then gets automatically redirected to domain.com/folder/ (without the www!) to add the trailing / and then redirected onwards to www.domain.com/folder/ to add the trailing / back on again.
The intermediate step at domain.com/folder/ could kill your listings.
The url's are redirected properly, the title, meta, slashes are all correct. The only problem is what is showing in the index. All links have been checked and give the proper 301 through Xenu. This was all checked a very long time ago.
Google simply has problems and its obvious. Hopefully the new options fixes the issue.
a.) Of the millions of websites, how many of them have different pages for www.domain.com and domain.com?!
b.) Assuming the answer to the above is none, why haven't G engineers figured this out yet?
That is a terrible assumption to make. I can think of quite a few very important domains where that is not the case. this is especially true with domains that have several levels of subdomains, like large corporations and .edu domains.
Historically, there weren't many entities that would route the traffic to different machines based on the port, you routed it based on the domain name. If anything, the default domain was used as the email machine. Then you would have a few default machines with domain names for the services that they offered, such as ftp.example.com. The new, funky thing called the World Wide Web got its own server, usually set up by some geek in a lab that just wanted to play with it. Naturally they gave it a subdomain of www.
Granted, the majority of sites have both the primary domain and the www subdomain pointing to the same site, that does not mean that assuming that they are the same is the correct way to handle it.
While google is aware that they have a problem that they need to sort out, I would rather have them come up with something that will work right in all cases, not only the cases where the webmasters or hosting services are ignorant.
As data gets copied to more places, the fresher supplemental results should eventually be visible everywhere, not just the U.S.
"fresher supplemental results" is that good or bad?
Is that data heading out from the 18.104.22.168 dc (it has hit a fair few more now too)?
It is just that for me the data refresh on the 27th July (the second 27th one) seemed to hit all DCs but not that one - so is that DC (and associated ones) still at a less advanced stage in some aspects but a more advanced stage in others?
Now a million dollar question - when / how to get rid of supplemental result after 02-Jan-2006?
Ideas for sitemaps team:
1) Realtime url removal tool from index or any kind of url removal / xml support
2) Duplicate content indicator
GG, it'd be a good thing if they added the URL Removal tool to WebmasterCentral, and I'm all for it, since I really hate how many different times I have to log in to different places (under different Google accounts, mind you) in order to get things done, but my post was actually about the fact that I was having TROUBLE with the url removal tool, and there's no place I can report it or get help with it. That's mainly what I was asking for to be added to the overall picture.
Regarding url removal - noindex is fine for pages which are accessible. What happends if some pages where indexed by my mistake - i remove them - but google still shows the pages that i dont want too.
My comment about current url removal tool: SLOW SLOW SLOW.
It took me over 40 days to remove 50 pages. Lets say - its okey.
But man - it took me 1 hour to submit those 50 pages to google removal tool! Thats why i say - make something like remove-me.XML and problem solved. Other option is to use current sitemap.xml but to add new flag: NOINDEX
If we can help google to find pages, then we can help them to remove pages too :)
"Display URLs as www.mysite.com (for both www.mysite.com and mysite.com)
Display URLs as mysite.com (for both www.mysite.com and mysite.com) "
Is there any capability (maybe it is already there as part of the above?) of setting the https to http? Right now when you search site:mydomain.com you get [mydomain.com...] at the top of the list returned.
He also said that PR gets merged too.
Removing multiple pages is best done by listing them in the robots.txt file and submitting that, but I am not sure if that touches old supplemental results in any way though.
I am hesitant in doing a quick fix in sitemaps because the site ranks well in serps and if it does not pass pr and back links, it could tank in the serps.
It is one thing to say we think it will pass pr and back links and another thing to say google tested it and it seems to work.