A very interesting point.
It also occured to me that probably all Google will do is check for a 200 response for that file request. So sites that deliver 200 responses to 404 errors (i've seen a few) can easily be 'hijacked' for the sitemaps statistics information.
I just tested this, and it does work...
I thought the file had to be in the root directory of the site.
Any site that allows visitors to create files in the root directory is a site asking for security trouble with a capital T.
try adding aol.com :)
Methinks Google need to start checking the content of the verification file ;)
Looks like this has been taken down - I got an "Our system has experienced a temporary problem." error trying to add a certain MS site (another "200 OK response to missing pages" casualty) ;)
Still, all quite amusing, and fortunate that the information displayed was of the unharmful variety.
How is this a "serious problem"? All you see are the top 5 keywords, some page statistics -- oooooooh PR distribution :-) - where's the big deal?
Also, keep in mind although Google setup the validation system to the way it is (you must create a specified file on your server), the remote server is the one saying that the validation-file exists when in fact it doesn't. Yes, Google will have to change the system a bit. They'll probably add the (previously included) check for valid 404-codes on non-existing pages to the non-sitemap-statistics signup. That means that these servers will NOT be able to get their Google statistics at all since they do NOT return the default 404 (or 403, 410) result codes for pages that do not exist.
So in the end, Google will go from letting everyone see the stats to letting nobody see the stats for those files. Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-)
What would be nice would be an alternative way of validating your site, perhaps even a requiered 2-way validation. I could imagine Google sending a mail to the registrant in the whois information (unless it's unknown :-)) or asking the webmaster to add a private meta-tag in the html-header to one of the files. There are probably even other ways.
[edited by: engine at 4:38 pm (utc) on Nov. 18, 2005]
[edit reason] TOS [/edit]
|Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-) |
They key comment there is the last line. Regardless of who's fault the incorrect response headers are, it's still not a very good validation of who owns a particular website.
And while the statistics aren't currently very comprehensive, would you want all and sundry viewing this information about your sites? I know I wouldn't.
Btw - you should remove the links to your own website in your post as they violate the TOS [webmasterworld.com].
You're right, they don't SHOW much YET. However, seeing how Google Sitemaps and Google Base are the first real two-way-interfaces between Joe-Webmaster and the search engines, who's to say that it will stop with statistics?
With a good validation / authentication system in place Google could add other things ... perhaps like a possibility of letting the webmaster change existing, indexed URLs (eg redirect bad URLs to the correct ones) or even allow him to directly remove single bad URLs (without the funny 180 day scheme now), perhaps allow the webmaster to specify other structured pieces of information (geographic targeting of URLs within a website, etc.). A good validation scheme, which any webmaster can comply with, is the first step to a good two-way communications between the two important parts of a good information retrieval system (clean, good indexing + good content). I wrote up some more about this on my blog, but I think I can't post the link (it's in my profile).
sorry about those links - didn't see that in the TOS; mods: feel free to remove them - I have no idea how to edit posts, perhaps I'm just overdue for the weekend :-)
The bug has been fixed, noticed this earlier today, and here is the official statement:
Delighted to see it fixed. It wouldn't have been nice for webmasters to suddenly find their competition had got hold of their sitemap data, crawl error/type data, top search terms, etc...
Google should really verify the content on the file.
My understanding is if your site is configured to return a generated page for 404 error instead of just the plain error you cannot get your site verified.
I don't think it's worth it to turn off that feature just to get that tiny extra bit of info. All Google would have to do is check the content of the verify file to know that I do indeed own the site. Very frustrating
"To ensure that no one can take advantage of this configuration to view statistics to sites they don't own, we only verify sites that return a status of 404 in the header of 404 pages."
This has nothing to do with whether you have a generated 404 page or a plain one, it only cares that your 404 page does indeed return a 404 header. There's nothing to stop you returning a 404 header *and* a customized page for the end user.
Seems like the best strategy is to add your own sites and validate them yourself and then nobody else can claim them.
Although they would also have to submit a valid sitemap file before they could even verify anything.. So I am not sure why anyone would even go to this trouble except to just be annoying.
"We fixed this as soon as we found out about the problem"
I wonder where they found that out?;)
Wow that is quite a bit thing to miss. I'm amazed I didn't think of this myself, but anyhow, it's nice to hear that they fixed the problem.
I'm not sure but I believe this exploit was first published on another website (which is where I first read about it).
There was also a bug in Google Base allowing XSS cross-site scripting in essence opening up gmail accounts to hackers.
Mini update: They fixed the problem with sites that don't give 404 for non-existant pages.
They have not looked at the problem of where a site user can create their own page.
Many wiki give a page allowing you to start a new article when you try an unknown page - but as they should - they give 404. Just starting the article is enough to give 200 OK and validate the site for Google sitemaps. (Wiki of the form widget.com/widgets (i.e. no .html) are not exempt - may will still willingly create pages with .html if requested)
I have found 3 sites which provide site.com/membername.html style profile pages. Registering GOOGLEser1684ser84 as your username will give you the required site.com/GOOGLEwe4r54fe54r.html page which will validate.
If you fall into any of these categories, then you must ensure you do something about it before "all your stats are belong to" someone else!