Forum Moderators: coopster

Message Too Old, No Replies

Screen Scraping.

         

NickCoons

2:50 pm on May 24, 2006 (gmt 0)

10+ Year Member



One of my clients has taken on a new client that is passing 5,000 accounts on to them. The way this works is that my client logs into their client's website, enters one of the 5,000 account numbers, retrieves the data, and manually enters this into their own database, then repeats. Obviously this is extremely time-consuming. We've asked them for a data feed, but they have no other method of providing the account information to us except manually. But we did ask their permission to screen-scrape, and they're fine with that.

However, I've run into a technical problem. If you visit the website with IE, it works fine. If you visit it with any other browser, it returns a 403 Forbidden. So this means that when I write my screen-scraper, I have to make it think that I'm IE. I went into Firefox and spoofed my user-agent to make the site think I was running IE on XP, but I still received the 403 error.

I was under the impression that the only way a server could tell which browser I was using was to use the user-agent, but somehow it knows that I'm not using IE even though the user-agent says so. Any ideas?

StupidScript

6:03 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe the site uses an ActiveX control or some other MSIE-friendly method of authentication? Since the user-agent is so easily spoofed it would be remarkable if they relied on that for anything.

Can't they do a dump of all records in the db associated with your client and send it to you as a .CSV file or something?

NickCoons

4:34 am on May 25, 2006 (gmt 0)

10+ Year Member



<Maybe the site uses an ActiveX control or some other MSIE-friendly method of authentication? Since the user-agent is so easily spoofed it would be remarkable if they relied on that for anything.>

I'm not really sure why they're limiting use to IE. Once in the site, it doesn't look like it uses anything IE-specific. The only time I've seen sites require a specific browser like IE is because their site is incompatible with other browsers. Normally, using the UA for this is fine because if someone is going to put the effort into spoofing the UA, then they're simply going to get into a site that's incompatible.. and someone savvy enough to spoof probably already knows that.

<Can't they do a dump of all records in the db associated with your client and send it to you as a .CSV file or something?>

I'm sure they *can* do that, but they won't. My client's client is a very large company, and they have no intention of changing anything for my client's benefit.

Habtom

5:20 am on May 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I don't have any idea why it is happening. I like to hear the reason when you solve it.

Could someone tell me more about screen-scraper?

tnx,
Hab