Forum Moderators: open

Message Too Old, No Replies

Why the obsession with Favicon?

         

Mokita

4:00 am on Aug 9, 2010 (gmt 0)

10+ Year Member



For sometime now I have been seeing many, frequent requests for favicon.ico, which always gets a 403 because they provide no referer and no user-agent. They don't request any other files.

The vast majority come from various amazonaws IPs, but I have just seen one from Proxad France.

What is it about favicon that they want it so badly? There must be something I am not understanding.

blend27

4:56 am on Aug 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Testing if their IP Range blocked on .HTACCESS level? Favicon is usually the smallest file on the site? Toolbars try to grab it to stay nice, pretty and colorful? Taste the Favi Rainbow?

dstiles

7:18 pm on Aug 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lots of things collect favicon: SEs to light up their SERPS, browsers (restart browser with old page), browser add-ins/toolbars, bookmarkers.

There is (was?) at least one sort-of-SE that collects icons to display on a links page - never went to any pages, just assumed favicon was in the root (which is why I moved mine out of the root).

keyplyr

7:53 pm on Aug 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I allow blank referrer for favicon, robots.txt, and several other info type flat files.

Although the favicon.ico gets scraped often for favicon collection type sites and bullets in blogs/directories, I still tolerate categorical remote linking since, after all, it is part of my branding and I'd rather control it than give that power to someone's unknown discretion.

Mokita

4:27 am on Aug 10, 2010 (gmt 0)

10+ Year Member



Hi Guys,

Thanks for your responses.

I am familiar (and comfortable) with toolbars, browsers etc fetching favicon - but they usually have a user-agent

e.g. "GET /favicon.ico HTTP/1.1" 200 2238 "-" "Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http:// desktop.google .com/)"

The requests that are bugging me are mostly coming from amazonaws IP ranges (as mentioned in my OP). Those are definitely not "normal" requests, from humans, browsers, toolbars, SEs etc.

e.g. ec2-67-202-61-111.compute-1.amazonaws.com - - [09/Aug/2010:08:34:41] "GET /favicon.ico HTTP/1.1" 403 - "-" "-"

The frequency of the many requests, by what are very obviously bots, is what got me wondering "why are they so interested in favicon".

blend27 wrote:
Testing if their IP Range blocked on .HTACCESS level?


That is a possibility that I had thought of. So I allowed access to favicon for "all" to see what they would do then, but nothing changed. The requests for only favicon continued. So I changed it back to deny when both referer and user agent are missing.

dstiles wrote:
There is (was?) at least one sort-of-SE that collects icons to display on a links page - never went to any pages, just assumed favicon was in the root (which is why I moved mine out of the root).


I remember that - iconsurf. com but I haven't seen it in any of my sites in a long time. Not sure why you have a problem with it, when I first spotted it I just denied it in robots.txt and it obeyed.

keyplyr wrote:
I allow blank referrer for favicon, robots.txt, and several other info type flat files.


I allow blank referer for robots.txt, and .shtml files (my error pages). I used to also allow .ico, but removed it when the current rush from amazonaws started.

So I am still flummoxed as to why they want it. You'd think that when denied access to favicon, they'd try to get a page or another file - but they don't. They just come back a bit later and ask for favicon again.

I guess I'll never know why - but thought it was worthwhile asking here as there are members who seem to know a lot about this sort of thing.

Pfui

6:13 am on Aug 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW re AWS/amazonaws.com:

amazonaws.com plays host to wide variety of bad bots
[webmasterworld.com...]

Long thread short: 403 hits to all files but robots.txt

Mokita

6:36 am on Aug 10, 2010 (gmt 0)

10+ Year Member



Pfui wrote:
Long thread short: 403 hits to all files but robots.txt


Thanks Pfui - am doing that already with all known amazonaws IPs (except that I also allow them .shtml which are my error pages).

I was just hoping someone would know (or educated guess) at why favicon is the file of choice. This thread was precipitated when I saw the same thing occurring from a Proxad France IP - looks like it is contagious <sigh>.

enigma1

12:31 pm on Aug 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



why favicon is the file of choice

I believe because it is invisible to browsers what's going on behind the request of a typical favicon.ico file. And another thing is some browsers will retrieve it regardless what the HTML says. It may also used for fast spamming of server-logs because of the tiny filesize and bypass of typical application filters that exist with the main scripts.