homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Googlebot Has Ears
and they seem to like good music
Samizdata




msg:4438642
 6:37 am on Apr 9, 2012 (gmt 0)

For many years I have done web radio broadcasts for small personal projects.

My current station streams an archive loop 24/7 and I do regular live shows. Most of the content is me jamming with other musicians over the internet.

For the past three years I have used a particular Icecast streaming service, linking to it from my website using the server IP, port number and mountpoint as the URL.

In the past month Googlebot has started tuning in.

Over ten hours listening (500+ Mb) according to the stats.

This has never happened before. I am not aware of any changes in the streaming service and I can't do any bot control as I have very limited server access.

What interests me is Googlebot's apparent change of behaviour.

It seems to be copying my music.

...

 

incrediBILL




msg:4438645
 6:57 am on Apr 9, 2012 (gmt 0)

Is it the normal Googlebot UA doing this?

Got an exact IP and UA you can post?

Considering they just recently launched Google Music, and are also trying to identify original content authors in other medium, it wouldn't surprise me that they are profile music files as well and possibly coming up with some type of music search like Shazam does.

Samizdata




msg:4438779
 4:01 pm on Apr 9, 2012 (gmt 0)

Is it the normal Googlebot UA doing this?

Not sure yet - it's a third-party service with limited stats.

Got an exact IP and UA you can post?

66.249.66.36 - "listened" for over two hours
66.249.66.114 - "listened" for over two hours
66.249.66.202 - "listened" for almost two hours
66.249.72.151 - "listened" for ninety minutes

A few more in the same range didn't like my music quite so much.

All this is in the last month, I've never seen it before.

music files

It is actually a continuous stream running 24/7 that is being accessed.

Fortunately I am not charged for bandwidth.

...

Samizdata




msg:4438825
 6:44 pm on Apr 9, 2012 (gmt 0)

After a little more digging I have a culprit:

SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

As I recall Googlebot-Mobile has several UAs but only this one seems to listen in.

Interestingly, it appears to "listen" for about ten minutes then reconnect immediately.

One conclusion might be that it is misconfigured, as no other bots seem to do it.

Another (less likely) is that I have a robotic fan.

...

dstiles




msg:4438842
 7:44 pm on Apr 9, 2012 (gmt 0)

I would ask how G got the URL in the first place.

My guess would be someone with a) G toolbar; b) gmail; c) chrome; d) android; e) logged in to G.

That is assuming, of course, that the URL is not in the SERPS.

keyplyr




msg:4438848
 7:54 pm on Apr 9, 2012 (gmt 0)


I suggest calling an off-page script to serve the "listen" link so bots don't trip it and set a timer that the (human) user needs to refresh to continue to listen. I serve an audio stream this way and have never had an issue.

Samizdata




msg:4438882
 8:39 pm on Apr 9, 2012 (gmt 0)

I would ask how G got the URL in the first place

As stated above, it is linked from my website, and has been for years.

As you suggest, there are probably a few other links out there too.

The point is that this activity is very recent.

And it only comes from the one UA - standard Googlebot, other Googlebot-Mobile variations, Bingbot and the rest must all know about the URL, but none of them ever listen in (they presumably detect it as a continuous audio stream in the same way iTunes does).

Only the Samsung variant appears to have headphones.

...

Samizdata




msg:4438912
 9:10 pm on Apr 9, 2012 (gmt 0)

I suggest calling an off-page script to serve the "listen" link so bots don't trip it and set a timer that the (human) user needs to refresh to continue to listen. I serve an audio stream this way and have never had an issue.

Thanks keyplyr, one problem is that the stream is not served from my own site.

Another is that the stream URL is available from other websites.

No other bot has ever actually loaded the stream before, though.

...

lucy24




msg:4438915
 9:14 pm on Apr 9, 2012 (gmt 0)

Inevitable follow-up question: A while back, Image Search added the "looks like" option. The one where you drag in a picture of a pet rat dozing in a bakery bag, and it brings up pictures of (a) assorted close-ups of critters with eyes, and (b) pictures that are dark in the middle and white all around. (There was a (c), but I forget.)

Are we about to get a Google Music Search where you drag in something and it spits out audio files that "sound like" your specimen?

Samizdata




msg:4438934
 10:36 pm on Apr 9, 2012 (gmt 0)

Inevitable follow-up question

I don't know the answer Lucy, but I do sound like a rat in a bakery bag.

Congratulations on completing a year on WebmistressWorld.

Over 3,000 posts too - impressive.

...

keyplyr




msg:4438935
 10:39 pm on Apr 9, 2012 (gmt 0)



Thanks keyplyr, one problem is that the stream is not served from my own site.

Doesn't need to be. Serve the *link* through a script. If you also wish to set a timer that needs to be refreshed to continue listening, then put it all in a child window and control the life of the window with the timer script.

Examples of all this stuff you can find on the web.

incrediBILL




msg:4438991
 2:09 am on Apr 10, 2012 (gmt 0)

Another possibility is someone working in the search division is actually listening to your music, there's one in every crowd, using their own VPN that uses the googlebot UAs and IPs.

Probably not, but anything is possible. ;)

Worse case you're on the Googlebot playlist.

blend27




msg:4439310
 8:17 pm on Apr 10, 2012 (gmt 0)

Let them listen to the Chrome adverts for 2 hours ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved