Forum Moderators: open

Message Too Old, No Replies

MySpace News bot

improper identification

         

MaxM

1:19 pm on May 17, 2007 (gmt 0)

10+ Year Member



They didn't bother to change their user agent to something meaningful.

Jakarta Commons-HttpClient/3.0-rc2 from 216.178.35.203 amongst others.

I tried to send a quick message to them via their web form but that just threw an error. Oh well, let it eat 403's then...

Just one line :)


client.getParams().setParameter("http.useragent","MySpace News (http://news.myspace.com)");

incrediBILL

5:53 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I concur.

We should start to petition these companies to fix their UAs.

It's one lousy line of code left out by one lazy or incompetent programmer which is a signal of quality issue that it shouldn't be allowed to crawl in the first place.

If these bots can't include the following:

1) actual UA identifying the source

2) include a link to a page with more information about what they want, and what parts of robots.txt they honor

3) a form to contact them for bugs and/or crawl removal if it fails to stop via robots.txt

Without those, it should just be an automatic block, party over, done.

jdMorgan

1:32 pm on May 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, please give us an info page, too!

client.getParams().setParameter("http.useragent","Mozilla/4.0 (compatible; MySpace News Bot; http://news.myspace.com/[b]newsbot[/b])");

Jim