Welcome to WebmasterWorld Guest from 54.224.230.193

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Google Sitemaps: Serious Problem

Webmasters advised to take immediate action

     
5:39 pm on Nov 17, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


Without giving specifics, there is a major problem with the site map system which allows anyone who can create a file to 'claim' ownership of a site, or a domain within a site.

If you own or run any WIKI, any system which allows user uploadable files (with specified file name), or any system which allows users to create pages with names based upon the title ('seo friendly' pages) then I urge you to take steps to ensure that URLs of the form:

GOOGLEb74h6s3m49v87f.html

cannot be created.

Regular expressions (for rapid implementation):


/GOOGLE[a-z0-9]+\.html/

PHP snippet, assuming $name is new file name:


if (preg_match("/GOOGLE[a-z0-9]+\.html/",$title)) die("Sorry, name not allowed");
11:54 am on Nov 18, 2005 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


A very interesting point.

It also occured to me that probably all Google will do is check for a 200 response for that file request. So sites that deliver 200 responses to 404 errors (i've seen a few) can easily be 'hijacked' for the sitemaps statistics information.

I just tested this, and it does work...

2:37 pm on Nov 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 4, 2005
posts:70
votes: 0


I thought the file had to be in the root directory of the site.

Any site that allows visitors to create files in the root directory is a site asking for security trouble with a capital T.

2:43 pm on Nov 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 5, 2001
posts:2466
votes: 0


try adding aol.com :)

DaveN

2:48 pm on Nov 18, 2005 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


Lol?!

Methinks Google need to start checking the content of the verification file ;)

3:08 pm on Nov 18, 2005 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


Looks like this has been taken down - I got an "Our system has experienced a temporary problem." error trying to add a certain MS site (another "200 OK response to missing pages" casualty) ;)

Still, all quite amusing, and fortunate that the information displayed was of the unharmful variety.

3:11 pm on Nov 18, 2005 (gmt 0)

New User

10+ Year Member

joined:May 27, 2005
posts:12
votes: 0


How is this a "serious problem"? All you see are the top 5 keywords, some page statistics -- oooooooh PR distribution :-) - where's the big deal?

Also, keep in mind although Google setup the validation system to the way it is (you must create a specified file on your server), the remote server is the one saying that the validation-file exists when in fact it doesn't. Yes, Google will have to change the system a bit. They'll probably add the (previously included) check for valid 404-codes on non-existing pages to the non-sitemap-statistics signup. That means that these servers will NOT be able to get their Google statistics at all since they do NOT return the default 404 (or 403, 410) result codes for pages that do not exist.

So in the end, Google will go from letting everyone see the stats to letting nobody see the stats for those files. Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-)

What would be nice would be an alternative way of validating your site, perhaps even a requiered 2-way validation. I could imagine Google sending a mail to the registrant in the whois information (unless it's unknown :-)) or asking the webmaster to add a private meta-tag in the html-header to one of the files. There are probably even other ways.

Cheers!

[edited by: engine at 4:38 pm (utc) on Nov. 18, 2005]
[edit reason] TOS [/edit]

3:21 pm on Nov 18, 2005 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


Yes, those statistics are neat, but they're not so private that they need to be kept under lock and key. That might change in the future of course :-)

They key comment there is the last line. Regardless of who's fault the incorrect response headers are, it's still not a very good validation of who owns a particular website.

And while the statistics aren't currently very comprehensive, would you want all and sundry viewing this information about your sites? I know I wouldn't.

Btw - you should remove the links to your own website in your post as they violate the TOS [webmasterworld.com].

4:31 pm on Nov 18, 2005 (gmt 0)

New User

10+ Year Member

joined:May 27, 2005
posts:12
votes: 0


You're right, they don't SHOW much YET. However, seeing how Google Sitemaps and Google Base are the first real two-way-interfaces between Joe-Webmaster and the search engines, who's to say that it will stop with statistics?

With a good validation / authentication system in place Google could add other things ... perhaps like a possibility of letting the webmaster change existing, indexed URLs (eg redirect bad URLs to the correct ones) or even allow him to directly remove single bad URLs (without the funny 180 day scheme now), perhaps allow the webmaster to specify other structured pieces of information (geographic targeting of URLs within a website, etc.). A good validation scheme, which any webmaster can comply with, is the first step to a good two-way communications between the two important parts of a good information retrieval system (clean, good indexing + good content). I wrote up some more about this on my blog, but I think I can't post the link (it's in my profile).

sorry about those links - didn't see that in the TOS; mods: feel free to remove them - I have no idea how to edit posts, perhaps I'm just overdue for the weekend :-)

9:39 pm on Nov 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


The bug has been fixed, noticed this earlier today, and here is the official statement:
[sitemaps.blogspot.com...]
10:16 am on Nov 19, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


Delighted to see it fixed. It wouldn't have been nice for webmasters to suddenly find their competition had got hold of their sitemap data, crawl error/type data, top search terms, etc...
3:37 pm on Nov 19, 2005 (gmt 0)

New User

10+ Year Member

joined:Sept 22, 2005
posts:22
votes: 0


Google should really verify the content on the file.
My understanding is if your site is configured to return a generated page for 404 error instead of just the plain error you cannot get your site verified.

I don't think it's worth it to turn off that feature just to get that tiny extra bit of info. All Google would have to do is check the content of the verify file to know that I do indeed own the site. Very frustrating

5:13 pm on Nov 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 4, 2005
posts:70
votes: 0


From google:

"To ensure that no one can take advantage of this configuration to view statistics to sites they don't own, we only verify sites that return a status of 404 in the header of 404 pages."

This has nothing to do with whether you have a generated 404 page or a plain one, it only cares that your 404 page does indeed return a 404 header. There's nothing to stop you returning a 404 header *and* a customized page for the end user.

5:51 pm on Nov 19, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:July 26, 2005
posts:486
votes: 0


Seems like the best strategy is to add your own sites and validate them yourself and then nobody else can claim them.

Although they would also have to submit a valid sitemap file before they could even verify anything.. So I am not sure why anyone would even go to this trouble except to just be annoying.

6:13 pm on Nov 19, 2005 (gmt 0)

Full Member

10+ Year Member

joined:May 24, 2003
posts:242
votes: 0


"We fixed this as soon as we found out about the problem"

I wonder where they found that out?;)

6:48 pm on Nov 19, 2005 (gmt 0)

New User

10+ Year Member

joined:Oct 26, 2005
posts:14
votes: 0


Wow that is quite a bit thing to miss. I'm amazed I didn't think of this myself, but anyhow, it's nice to hear that they fixed the problem.
7:03 pm on Nov 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 8, 2002
posts:2335
votes: 0


Earwig,

I'm not sure but I believe this exploit was first published on another website (which is where I first read about it).

There was also a bug in Google Base allowing XSS cross-site scripting in essence opening up gmail accounts to hackers.

8:44 pm on Nov 19, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


Mini update: They fixed the problem with sites that don't give 404 for non-existant pages.

They have not looked at the problem of where a site user can create their own page.

Examples:

1.

Many wiki give a page allowing you to start a new article when you try an unknown page - but as they should - they give 404. Just starting the article is enough to give 200 OK and validate the site for Google sitemaps. (Wiki of the form widget.com/widgets (i.e. no .html) are not exempt - may will still willingly create pages with .html if requested)

2.

I have found 3 sites which provide site.com/membername.html style profile pages. Registering GOOGLEser1684ser84 as your username will give you the required site.com/GOOGLEwe4r54fe54r.html page which will validate.

If you fall into any of these categories, then you must ensure you do something about it before "all your stats are belong to" someone else!