Welcome to WebmasterWorld Guest from 18.104.22.168
If you own or run any WIKI, any system which allows user uploadable files (with specified file name), or any system which allows users to create pages with names based upon the title ('seo friendly' pages) then I urge you to take steps to ensure that URLs of the form:
cannot be created.
Regular expressions (for rapid implementation):
PHP snippet, assuming $name is new file name:
if (preg_match("/GOOGLE[a-z0-9]+\.html/",$title)) die("Sorry, name not allowed");
It also occured to me that probably all Google will do is check for a 200 response for that file request. So sites that deliver 200 responses to 404 errors (i've seen a few) can easily be 'hijacked' for the sitemaps statistics information.
I just tested this, and it does work...
Still, all quite amusing, and fortunate that the information displayed was of the unharmful variety.
Also, keep in mind although Google setup the validation system to the way it is (you must create a specified file on your server), the remote server is the one saying that the validation-file exists when in fact it doesn't. Yes, Google will have to change the system a bit. They'll probably add the (previously included) check for valid 404-codes on non-existing pages to the non-sitemap-statistics signup. That means that these servers will NOT be able to get their Google statistics at all since they do NOT return the default 404 (or 403, 410) result codes for pages that do not exist.
So in the end, Google will go from letting everyone see the stats to letting nobody see the stats for those files. Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-)
What would be nice would be an alternative way of validating your site, perhaps even a requiered 2-way validation. I could imagine Google sending a mail to the registrant in the whois information (unless it's unknown :-)) or asking the webmaster to add a private meta-tag in the html-header to one of the files. There are probably even other ways.
[edited by: engine at 4:38 pm (utc) on Nov. 18, 2005]
[edit reason] TOS [/edit]
Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-)
They key comment there is the last line. Regardless of who's fault the incorrect response headers are, it's still not a very good validation of who owns a particular website.
And while the statistics aren't currently very comprehensive, would you want all and sundry viewing this information about your sites? I know I wouldn't.
Btw - you should remove the links to your own website in your post as they violate the TOS [webmasterworld.com].
With a good validation / authentication system in place Google could add other things ... perhaps like a possibility of letting the webmaster change existing, indexed URLs (eg redirect bad URLs to the correct ones) or even allow him to directly remove single bad URLs (without the funny 180 day scheme now), perhaps allow the webmaster to specify other structured pieces of information (geographic targeting of URLs within a website, etc.). A good validation scheme, which any webmaster can comply with, is the first step to a good two-way communications between the two important parts of a good information retrieval system (clean, good indexing + good content). I wrote up some more about this on my blog, but I think I can't post the link (it's in my profile).
sorry about those links - didn't see that in the TOS; mods: feel free to remove them - I have no idea how to edit posts, perhaps I'm just overdue for the weekend :-)
I don't think it's worth it to turn off that feature just to get that tiny extra bit of info. All Google would have to do is check the content of the verify file to know that I do indeed own the site. Very frustrating
"To ensure that no one can take advantage of this configuration to view statistics to sites they don't own, we only verify sites that return a status of 404 in the header of 404 pages."
This has nothing to do with whether you have a generated 404 page or a plain one, it only cares that your 404 page does indeed return a 404 header. There's nothing to stop you returning a 404 header *and* a customized page for the end user.
Although they would also have to submit a valid sitemap file before they could even verify anything.. So I am not sure why anyone would even go to this trouble except to just be annoying.
They have not looked at the problem of where a site user can create their own page.
Many wiki give a page allowing you to start a new article when you try an unknown page - but as they should - they give 404. Just starting the article is enough to give 200 OK and validate the site for Google sitemaps. (Wiki of the form widget.com/widgets (i.e. no .html) are not exempt - may will still willingly create pages with .html if requested)
I have found 3 sites which provide site.com/membername.html style profile pages. Registering GOOGLEser1684ser84 as your username will give you the required site.com/GOOGLEwe4r54fe54r.html page which will validate.
If you fall into any of these categories, then you must ensure you do something about it before "all your stats are belong to" someone else!