Welcome to WebmasterWorld Guest from 54.196.208.187

Forum Moderators: Ocean10000

Message Too Old, No Replies

MySpace News bot

improper identification

     
1:19 pm on May 17, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 12, 2004
posts:45
votes: 0


They didn't bother to change their user agent to something meaningful.

Jakarta Commons-HttpClient/3.0-rc2 from 216.178.35.203 amongst others.

I tried to send a quick message to them via their web form but that just threw an error. Oh well, let it eat 403's then...

Just one line :)


client.getParams().setParameter("http.useragent","MySpace News (http://news.myspace.com)");
5:53 pm on May 17, 2007 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I concur.

We should start to petition these companies to fix their UAs.

It's one lousy line of code left out by one lazy or incompetent programmer which is a signal of quality issue that it shouldn't be allowed to crawl in the first place.

If these bots can't include the following:

1) actual UA identifying the source

2) include a link to a page with more information about what they want, and what parts of robots.txt they honor

3) a form to contact them for bugs and/or crawl removal if it fails to stop via robots.txt

Without those, it should just be an automatic block, party over, done.

1:32 pm on May 22, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, please give us an info page, too!

client.getParams().setParameter("http.useragent","Mozilla/4.0 (compatible; MySpace News Bot; http://news.myspace.com/[b]newsbot[/b])");

Jim