Forum Moderators: phranque

Message Too Old, No Replies

Apache servers querying websites as browsers

Can a Linux box query websites as if it was a browser?

         

a_chameleon

10:43 am on Aug 11, 2007 (gmt 0)

10+ Year Member



Problem: Our servers, are trying to access other web servers that host audio content, to download or access it.

These other severs are sometimes strict; worried about duplication and re-broadcasting, etc., etc. and will only serve the podcasts to browsers, with accordant valid user agent entries in the get calls - IOW, they won't release the podcast to another "server" unless the server is "coming in" as a regular browser.

Question: Is there a Linux application, that will allow our Linux boxes to access websites as if these Linux boxes were really Internet Explorer browsers? Maybe even an older version (for simplicity) of IE. like 6, or 5.1 ..?

TIA for any ideas or answers!

[edited by: encyclo at 6:03 pm (utc) on Aug. 11, 2007]
[edit reason] fixed formatting [/edit]

jdMorgan

5:56 pm on Aug 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If those other servers are 'strict' then they may have a reason to be so...

There are many "libraries" of HTTP access functions available in PERL, Java, PHP, etc. -- some of which can be modified to spoof valid browser user-agents. However, these kinds of programmatic accesses are still very easy to detect, since the 'user session context' is missing; The other sites' Webmasters will easily spot your requests for their media files occurring without any requests for the pages that are normally used to access the media on their site. Also, since all of these requests will be coming from your server's IP address, it will be a simple matter of blocking by IP address if they wish to stop you from fetching their files.

If you wish to "hotlink" or download other people's media files, be aware that they will take steps to block your access, and that your site will therefore appear to be very unreliable as a result. If you wish to access other sites' media in a reliable fashion, then I suggest entering into a contractual agreement with them to do so with their explicit permission. Doing otherwise will either make your site look broken much of the time, or land you in court...

This post is meant to be factual and informative, not accusatory.

Jim