Forum Moderators: open
What's happening is that the content I get back is missing data.... as opposed to the same request given by a browser. I think the first few sites where I encountered this were using ASP but I know the current site I'm working on is (It's ASPX).
How is this done?
If anyone is curious as to what my project is, I'm an affliate for a company and I really don't like the way they have their product information displayed and wanted to redisplay it in the style of my website. They just have like 1000+ products I need to port over and I can't submit a POST request properly because some form data is missing.
Most webmasters look rather dimly at this, and consider it stealing of their content which they created. I am assuming the website which you are trying to copy from using your php script has some Anti-Scraping code in place to stop you from copying the pages in an automated fashion.
Also a great number of websites check for the type of browser doing the request and format the resulting HTML output specifically for that browser. They may omit/add certain items in the HTML code based on that browser.
My intentions are pretty benign. I pretty much want users to stay on my site instead of leaving and searching for other affiliates (we can't change the price on products just how we present it and the copy). I also think the time I spent on my design has paid off and looks more professional then the company's site.
Another thing is that some people are spooked by buying from a MLM company. In this case I'm only trying to retail the products. I have no interest in having a gazillion people underneath me, it's a lot less headaches. So I'm trying to distance myself through my product website having it's unique style and copy.
I guess no matter how benign it's still the same "evil" deed.
What's missing in the HTML form that I know of is two hidden variables. But these aren't used in the request I'm making and I've added them in anyway with blank values (... "&name1=&name2=&"...) which is what my browser sent. I suspect there's some data missing from another hidden field that holds between 6-10kB of data.
I'll plug away at this for a bit. Thanks. You got my brain churning :)