homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

G-Strings Revisited

 11:09 pm on Oct 17, 2011 (gmt 0)

Going on almost a year ago, we talked about the "url(data:image/" log-exploding string, layout-busting examples of which can be seen here: [webmasterworld.com...]

At the time, consensus sort of suggested the URI might be related to the Google Toolbar -- specifically "GTB6.6" -- and Explorer.

I was wondering if anyone still sees (m)any of the URIs, or has any new thoughts/theories?

I see one or two every month (more when the person reuses the referrer during a visit), and the hits share the following characteristics (casually compiled since June):

URI Length: 1,691 characters
First 40 chars: url(data:image/png;base64,iVBORw0KGgoAAA
Location: Never root. Always /dir/url(data:image/
Type: Never jpg or gif. Always png

Referrer: Google SERPs
First 45 chars: http://www.google.com/url?sa=t&source=web&cd=
One was: http://www.google.co.uk/url?sa=t&source=web&cd=

UA: Explorer
MSIE 8.0? All but one
Trident/4.0? All but one
GTB7.x? All but one
The 'one': Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)

In the original thread, "chedar_ed" gave instructions on how to see what's represented by the base64 code but I stumble on those steps. What do you see?

By the way, if you want to see an example of a live "url(data:image/" image/URL, Google the following as-is --

Webmaster World

-- and scroll/advance to "Matt Cutts" or his tiny smiling face. Now check out that image's address. Hint: It's a whopping 2,199-characters long and begins -- wait for it --


I'm a Mac person. Is Explorer the only UA (still) choking on these things?





 11:50 pm on Oct 17, 2011 (gmt 0)

I regularly see them in the logs of sites running MediaWiki.


 1:35 am on Oct 18, 2011 (gmt 0)

A quick look through the last few days log, and yes there was one.

UA: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)

The image is similar but not identical to what I described in the earlier thread.
Originally "a question mark, a minus sign, a check box and then the small square Google logo (like the favicon) with a magnifying glass. "
Now: A question mark, a minus sign, a grey x (previously was a x in a box sort off like a check box), the small Google logo with magnifying glass, a blue X.

To see the image, just copy the whole date:image etc string into a bit of html, and open it up in your browser.
<img alt="Embedded Image"
src="data:image/png;base64,iVBORw0KGgo..." />



 4:07 am on Oct 18, 2011 (gmt 0)

This issue has me scratching my head.

It only affects two out of the 10 sites I monitor - all the rest are mercifully free of the scourge.

But why is that? They all receive visitors using MSIE 8 and 9 the same or similar to Pfui's UAs.

I've looked very carefully at the coding for the pages that seem to trigger the png request, and can't see anything any different to other pages. These are all plain HTML pages, nothing fancy.

I do wish there was a way to keep the code out of logs.


 5:04 am on Oct 18, 2011 (gmt 0)

Ditto. Few and far between, image exactly as described, why the ### can't Spotlight find it without opening the files?

GET /ebooks/alida/url(data:image/ {you may zone out here} ) HTTP/1.0" 404 1496 "http://www.example.com/ebooks/alida/Alida.html" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"
[Referer] ...www.google.com/search?sourceid=navclient&ie=UTF-8&rlz=1T4GGLL_enUS346US347&q={blahblah}

GET /ebooks/alida/url(data:image/ {been there, done that} ) HTTP/1.1" 404 1317 "http://www.example.com/ebooks/alida/Alida.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; {et cetera, et cetera, do I really give a ### what kind of NET CLR you have?} )"
[Referer] ...www.google.com/search?q={blahblah}&rlz=1I7GPEA_en&ie=UTF-8&oe=UTF-8&sourceid=ie7

Comes after all other requests including favicon. Is it some type of folder icon that might potentially live in the directory? Definitely nothing else in common except (coincidentally?) the same filename. Totally different searches.

If you disencode the percent encodings it no longer works as an image. Lots of online explanations of what it is, but no explanation of what g### wants with it.


 5:17 am on Oct 18, 2011 (gmt 0)

... but no explanation of what g### wants with it.


It is immaterial to me why G. wants it (I think it is something to do with their Toolbar).

What I want to know, is WHY G. thinks they'll find their image on MY server!


 7:25 am on Oct 18, 2011 (gmt 0)

Come to think of it... Under what circumstances do we most often find people looking for nonexistent files in our webspace?

#1 They're trying to poke holes in php files belonging to people who have rashly left everything with its default name.

#2 They're looking for evidence of prior visits by malignant robots who left behind something with a characteristic name.

Yes, OK, sometimes the browser asks for that Cross-Platform Thingummy which is apparently legit, but on the whole, on balance, most of the time...



 7:46 am on Oct 18, 2011 (gmt 0)

Lucy24 wrote:
Under what circumstances do we most often find people looking for nonexistent files in our webspace?

This issue is not about "people" asking for non-existent files, nor even any sort of bot asking for the image. I'd lay a huge bet that none of our human visitors have any remote idea that the png is being requested on their behalf.

Plus, I have never seen "bot-like" activity ask for it either. It is always human-visit related.

It appears to be some sort accidental mal-programming - either by GTB or MSIE, or an unholy mix of the two.

What is most disappointing, is that this issue was first reported more than a year ago, but it seems that neither G. nor M$ has done anything to either identify, own or rectify the problem.


 8:54 am on Oct 18, 2011 (gmt 0)

I've never seen it [the request for this file] on either of the 3 sites I currently manage, but I do of course see MSIE w/ GTB variants.


 2:14 pm on Oct 18, 2011 (gmt 0)

Mokita, I concur with all you've said this head-scratcher of a log scourge -- intermittent sites; real, unwitting visitors; bad G/MS programming; long-reported and ignored.


 3:58 pm on Oct 18, 2011 (gmt 0)

Dijkgraaf: Thanks for explaining how-to (again:) I also see differences between one from last December (which I could still embed) and one from this month:

2011: 70x14 pixels PNG: five 'parts':
? - x(grey) Gfavicon x(blue)

2010: 56x14 pixels PNG: four 'parts':
? - x(box) Gfavicon

Those parts are all in/from Google's sprites. Search for "Google sprites" to see arrays of images. There's early info here, too: "Google, Image replacement & CSS Sprites" [webmasterworld.com...]

'Google Sprite's G-Strings' -- Sounds like an iffy Halloween costume inspired by a bad cable TV show. :)


 8:19 pm on Oct 18, 2011 (gmt 0)

This issue is not about "people" asking for non-existent files, nor even any sort of bot asking for the image.

Fer hevvins sakes. You know perfectly well I didn't mean literal people. Real humans are also not individually asking for image files and favicons-- all the stuff that magically shows up when you load a page.

My point was that asking for nonexistent files is a type of behavior that we would normally associate with evil robots. This is not the first time, and will not be the last, that g### has done something which would get anyone else locked out on the spot.


 8:54 pm on Oct 18, 2011 (gmt 0)

I'm seeing the occasional one. I have the system set to return a 403 but not block the IP. Seems a reasonable compromise.

But the concensus is correct: it's time someone came clean on this one.


 6:44 pm on Feb 29, 2012 (gmt 0)

These things are ridiculous:


with, an ending of.... wait for it......


Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)


 8:05 pm on Feb 29, 2012 (gmt 0)

There's been discussion in several WebmasterWorld threads about these
data: URIs over the last few months.

There's a whole section about the Apache module
mod_data -- Convert response body into an RFC2397 data URL -- linked from [httpd.apache.org...]

See also: [httpd.apache.org...]
Now we have an RFC to refer to too. Wasn't aware of that one before.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved