homepage Welcome to WebmasterWorld Guest from 23.22.179.210
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
url(data:image/
G-Strings Revisited
Pfui




msg:4375644
 11:09 pm on Oct 17, 2011 (gmt 0)

Going on almost a year ago, we talked about the "url(data:image/" log-exploding string, layout-busting examples of which can be seen here: [webmasterworld.com...]

At the time, consensus sort of suggested the URI might be related to the Google Toolbar -- specifically "GTB6.6" -- and Explorer.

I was wondering if anyone still sees (m)any of the URIs, or has any new thoughts/theories?

I see one or two every month (more when the person reuses the referrer during a visit), and the hits share the following characteristics (casually compiled since June):

URI Length: 1,691 characters
First 40 chars: url(data:image/png;base64,iVBORw0KGgoAAA
Location: Never root. Always /dir/url(data:image/
Type: Never jpg or gif. Always png

Referrer: Google SERPs
First 45 chars: http://www.google.com/url?sa=t&source=web&cd=
One was: http://www.google.co.uk/url?sa=t&source=web&cd=

UA: Explorer
MSIE 8.0? All but one
Trident/4.0? All but one
GTB7.x? All but one
The 'one': Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)

In the original thread, "chedar_ed" gave instructions on how to see what's represented by the base64 code but I stumble on those steps. What do you see?

By the way, if you want to see an example of a live "url(data:image/" image/URL, Google the following as-is --

Webmaster World

-- and scroll/advance to "Matt Cutts" or his tiny smiling face. Now check out that image's address. Hint: It's a whopping 2,199-characters long and begins -- wait for it --

data:image/jpeg;base64

I'm a Mac person. Is Explorer the only UA (still) choking on these things?

Anyone?

Bueller?

 

g1smd




msg:4375655
 11:50 pm on Oct 17, 2011 (gmt 0)

I regularly see them in the logs of sites running MediaWiki.

Dijkgraaf




msg:4375688
 1:35 am on Oct 18, 2011 (gmt 0)

A quick look through the last few days log, and yes there was one.

UA: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)

The image is similar but not identical to what I described in the earlier thread.
Originally "a question mark, a minus sign, a check box and then the small square Google logo (like the favicon) with a magnifying glass. "
Now: A question mark, a minus sign, a grey x (previously was a x in a box sort off like a check box), the small Google logo with magnifying glass, a blue X.

To see the image, just copy the whole date:image etc string into a bit of html, and open it up in your browser.
<html>
<head>
<body>
<img alt="Embedded Image"
src="data:image/png;base64,iVBORw0KGgo..." />

</body>
</html>

Mokita




msg:4375734
 4:07 am on Oct 18, 2011 (gmt 0)

This issue has me scratching my head.

It only affects two out of the 10 sites I monitor - all the rest are mercifully free of the scourge.

But why is that? They all receive visitors using MSIE 8 and 9 the same or similar to Pfui's UAs.

I've looked very carefully at the coding for the pages that seem to trigger the png request, and can't see anything any different to other pages. These are all plain HTML pages, nothing fancy.

I do wish there was a way to keep the code out of logs.

lucy24




msg:4375751
 5:04 am on Oct 18, 2011 (gmt 0)

Ditto. Few and far between, image exactly as described, why the ### can't Spotlight find it without opening the files?

July:
GET /ebooks/alida/url(data:image/ {you may zone out here} ) HTTP/1.0" 404 1496 "http://www.example.com/ebooks/alida/Alida.html" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"
[Referer] ...www.google.com/search?sourceid=navclient&ie=UTF-8&rlz=1T4GGLL_enUS346US347&q={blahblah}

August:
GET /ebooks/alida/url(data:image/ {been there, done that} ) HTTP/1.1" 404 1317 "http://www.example.com/ebooks/alida/Alida.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; {et cetera, et cetera, do I really give a ### what kind of NET CLR you have?} )"
[Referer] ...www.google.com/search?q={blahblah}&rlz=1I7GPEA_en&ie=UTF-8&oe=UTF-8&sourceid=ie7

Comes after all other requests including favicon. Is it some type of folder icon that might potentially live in the directory? Definitely nothing else in common except (coincidentally?) the same filename. Totally different searches.

If you disencode the percent encodings it no longer works as an image. Lots of online explanations of what it is, but no explanation of what g### wants with it.

Mokita




msg:4375758
 5:17 am on Oct 18, 2011 (gmt 0)

... but no explanation of what g### wants with it.


Lucy,

It is immaterial to me why G. wants it (I think it is something to do with their Toolbar).

What I want to know, is WHY G. thinks they'll find their image on MY server!

lucy24




msg:4375789
 7:25 am on Oct 18, 2011 (gmt 0)

Come to think of it... Under what circumstances do we most often find people looking for nonexistent files in our webspace?

#1 They're trying to poke holes in php files belonging to people who have rashly left everything with its default name.

#2 They're looking for evidence of prior visits by malignant robots who left behind something with a characteristic name.

Yes, OK, sometimes the browser asks for that Cross-Platform Thingummy which is apparently legit, but on the whole, on balance, most of the time...

Hmm.

Mokita




msg:4375801
 7:46 am on Oct 18, 2011 (gmt 0)

Lucy24 wrote:
Under what circumstances do we most often find people looking for nonexistent files in our webspace?


This issue is not about "people" asking for non-existent files, nor even any sort of bot asking for the image. I'd lay a huge bet that none of our human visitors have any remote idea that the png is being requested on their behalf.

Plus, I have never seen "bot-like" activity ask for it either. It is always human-visit related.

It appears to be some sort accidental mal-programming - either by GTB or MSIE, or an unholy mix of the two.

What is most disappointing, is that this issue was first reported more than a year ago, but it seems that neither G. nor M$ has done anything to either identify, own or rectify the problem.

keyplyr




msg:4375839
 8:54 am on Oct 18, 2011 (gmt 0)

I've never seen it [the request for this file] on either of the 3 sites I currently manage, but I do of course see MSIE w/ GTB variants.

Pfui




msg:4375931
 2:14 pm on Oct 18, 2011 (gmt 0)

Mokita, I concur with all you've said this head-scratcher of a log scourge -- intermittent sites; real, unwitting visitors; bad G/MS programming; long-reported and ignored.

Pfui




msg:4375995
 3:58 pm on Oct 18, 2011 (gmt 0)

Dijkgraaf: Thanks for explaining how-to (again:) I also see differences between one from last December (which I could still embed) and one from this month:

2011: 70x14 pixels PNG: five 'parts':
? - x(grey) Gfavicon x(blue)

2010: 56x14 pixels PNG: four 'parts':
? - x(box) Gfavicon

Those parts are all in/from Google's sprites. Search for "Google sprites" to see arrays of images. There's early info here, too: "Google, Image replacement & CSS Sprites" [webmasterworld.com...]

'Google Sprite's G-Strings' -- Sounds like an iffy Halloween costume inspired by a bad cable TV show. :)

lucy24




msg:4376171
 8:19 pm on Oct 18, 2011 (gmt 0)

This issue is not about "people" asking for non-existent files, nor even any sort of bot asking for the image.

Fer hevvins sakes. You know perfectly well I didn't mean literal people. Real humans are also not individually asking for image files and favicons-- all the stuff that magically shows up when you load a page.

My point was that asking for nonexistent files is a type of behavior that we would normally associate with evil robots. This is not the first time, and will not be the last, that g### has done something which would get anyone else locked out on the spot.

dstiles




msg:4376196
 8:54 pm on Oct 18, 2011 (gmt 0)

I'm seeing the occasional one. I have the system set to return a 403 but not block the IP. Seems a reasonable compromise.

But the concensus is correct: it's time someone came clean on this one.

Seedy




msg:4423207
 6:44 pm on Feb 29, 2012 (gmt 0)

These things are ridiculous:

/html/url(data:image/png;base64,


with, an ending of.... wait for it......

iVBORw0KGgoAAAANSUhEUgAAAEYAAAAOCAYAAACSJWqFAAAABGdBTUEAAL
GPC%2fxhBQAAAAlwSFlzAAALEgAACxIB0t1%2b%2fAAAAAd0SU1FB9oGAhENK17
O5ogAAAAZdEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAAGXRFW
HRTb2Z0d2FyZQBQYWludC5ORVQgdjMuNS44NzuAXQAAA99JREFUSMfdlmtMm1UY
x%2fkOM1vmBck2EIyZOiLZJRLlMhaXhQSjmRiHiVmi7gKpjshlBAiMhYG0C3UdinyYCUY
Mgm6MsYVLuV%2fLaArl1pYWGLQwJ6OIEeKnn%2bcdZc4CS6vDyJ7kn%2feck5Nz3vP
L%2f3nO8fDYIDFpnwsZvjPL2Ss38T9eTFTmdzT0ahi9paUmJgKNvxejfl6M%2b3phFt8BX
0%2b0vp4hHo97WO1zNJtvE5rdyLOJbXjHVZBbpsZo1TPSUIY%2byIdxAWRMyCSk932C
Tj9PHsnmcRk%2f%2bgv1COGQ1N7zfwAztbBIrWGKoKwWfFJv8vSpBqIVlWiNOkaH2tBG
BWPxW3KNwXcTOvFtE3pUYN4QqhPa4uhLbct6HfaCUsmF%2fPwVPy%2bNOY%2fbFh
ZoH79DZH4nPkkd%2bCS0E5lbg2aoF4OuhcZDQbTt2kxL4Fbad26mS6RWk9%2bmv60
RILciyXm%2fAIUYV1hxB9RpyTnrBsYBQALkDEv1wJgUgz%2ffxXB3jqzKAQKSGtnxWRM
JpVq6BnUoFbnEhu7lZOgrfPD%2bXg7HvYTsyPOogp9iJQDbPTlDcRmM5BpHKpWtZ4pI
AFQOGPegrOGi44X11A9PoLXNUNAwgrLOIPpjpJ3JoOr6DVo1AlJ3D58X5PFCShjbFK
%2bzK%2fXVVdxh4y9A7kORUsriALPFhfk8TC47Z5UUWo6X4y8TlnaZ%2bEtqrvWZsPxq
J1ueR8XVq0zMLlCsn6dk8Hc6zb%2bQcz6HgPT9bJWHrbqWBMJf7iaUB2pLkavzk5Q%2
f%2bT1MboFxSqHl2PdpKYGycnbGlpJW0olRpFVCSgrmyWkuaueZ%2fe0PrFN2SnR26s
cXkZ%2fPZ1tu%2bKpr3YciXwK0bmD%2bdZ1ZTiVHbVkNzlvpPyD7Ss3Zsg6%2bb%2b1
FP2lBdkrG6ISNIq0d48g0tW0mCppttEwskph8Gu%2bs8DUL7X3XuAvnP4PiVGhVa9SZm
n4LA5NGBkzl9Pcp6OlKISP9JDcqK%2flGgPm2eZxLHdMoNPOUNunIPJPF9nMHVtxKS2
5xLr421%2bA4HFO37mDcuK7NM5NoWtPR1gQz1BDIoDqQ8qJDvPfum3SIwlthWqB2bJ
Hi2m4%2b%2fPgYkZ%2fE8ExmBK5cy86wNlQM3eqh48p%2bTOog8dINwqjeTX%2fVPgr
PHSQ5OZmcPAVfXPyS9IS3UaX6Ex0bwZPxoRvzsO6E6baZ7usxDAkYw9XBGKpfw1gV
Tp86kbpujbit5LyY9g4HThwkP8mH6q%2bf4%2fBHkY8%2fGMuMNcQ42sxAU5IAc5Thmq
OY2nOwTA9zTbx%2b9yhP4K2IZnv2EaJk0ahSvKku3PGPwfwJAW1Er1YzJJgAAAAASU
VORK5CYII%3d)


host86-147-182-nnn.range86-147.btcentralplus.com
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)

g1smd




msg:4423232
 8:05 pm on Feb 29, 2012 (gmt 0)

There's been discussion in several WebmasterWorld threads about these
data: URIs over the last few months.

There's a whole section about the Apache module
mod_data -- Convert response body into an RFC2397 data URL -- linked from [httpd.apache.org...]

See also: [httpd.apache.org...]
Now we have an RFC to refer to too. Wasn't aware of that one before.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved