Forum Moderators: coopster & phranque

Message Too Old, No Replies

Parsing external HTML

Is it possible?

         

Hester

11:14 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have used PHP to open a file held on our server and parse it, replacing characters in the file before outputting it to the screen. But when I try to do the same with files held on a friendly server related to ours, I get an error that the file or directory doesn't exist. I also get the error if I use a full URL rather than just the local folders. Is it possible to parse an HTML document from another site?.

That way you can strip out the bits you don't need and use the information you want. Yes, I know that's what XML is for, but the site doesn't use it yet.

Grumpus

11:49 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can still open the HTML file you want to parse as an XML object and parse it. You won't be able to yoink nodes and automatically do it, but once you've created the XML object, you'll have the code of the page stored in that object and it can be parsed just like anything else.

G.

Hester

1:58 pm on Jan 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So I have to convert HTML to XML then back into HTML? Surely when I try to open the file it will still give me the errors.

[edited by: Hester at 4:09 pm (utc) on Jan. 27, 2003]

Hester

4:09 pm on Jan 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got it to work with a couple of external sites now, but the site I need must has some special server script on it, as it fails every time. It has an .asp filename, then at least one variable in the address.

Oh well.

stevedob

7:26 am on Jan 28, 2003 (gmt 0)

10+ Year Member



Are you 'urlencode'-ing the .asp URL before trying to grab it?

Hester

10:12 am on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, but if I try, it gives another error.

I managed to get another file to work on the same server (but with a different url) but when I try the .asp address, it just says "Unable to open file". I've tried stripping down the url, and the first part works! It's the bit after the .asp that causes an error. (And yes, I've even tried urlencoding just that part of the address!)

It looks like this:

[www....] ... .co.uk/filename.asp?page=VIEW&id=4&mode=edit

I've also tried replacing the & with & amp; but no luck.

Are asp files designed to stop external sites using them?

I guess it's just not possible.

Grumpus

12:20 pm on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's how I do it in ASP. You've got to be able to come up with a PHP translation. I've gone from PHP to ASP before and it's not all that difficult. If nothing else, you can at least see the "process".

<%
Function GetHTML(strPage)
On Error Resume Next
Set objXMLHttp = Server.CreateObject ("Microsoft.XMLHTTP")
objXMLHttp.Open "GET", strPage ,False,"",""
objXMLHttp.Send
If Err.Number = 0 Then
If objXMLHttp.Status = 200 then
GetHTML = objXMLHttp.ResponseText
Else
GetHTML = "Incorrect URL"
End if
Else
GetHTML = Err.Description
End If
Set objXMLHttp = Nothing
End Function
%>

Call your funtion:

<% myPreParsedFile = GetHTML(http://www.foo.com/filename.asp) %>

Now your "myPreParsedFile" variable contains all the HTML of the remote file. You can display it with <% =myPreParsedFile %> or you can write your parsing routine, then display it afterwards.

G.

Hester

12:42 pm on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a script already to deal with XML. But I'm not convinced converting yours or any script will work, as when it tries to open the file, surely I will get the same errors coming up. Or will I? I think it's time to try out other methods - a look through the PHP Manual is due again. Maybe your method of using XML and GetHTML (or PHP equivalent) might be the solution, rather than fopen.

Officially I've given up. But many times I've done that, and gone back to a script for one last go, and made it work.

Catch you later.

Edit: I looked at the 2 XML scripts I have and both use fopen as normal. So my scripts will never work. At least not with the server I need to parse files from.

I am wondering if it is a timing problem with the asp file. Sometimes when surfing the site normally, the page doesn't come up, but the home page instead. The site owners inform me this is a bug they are fixing. So maybe their server doesn't 'create' the file in time for my PHP request to grab it? Only it works without the variables in the URL (ie: grabbing just the template file).

Grumpus

1:05 pm on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What my script does is it creates an XML Object. It doesn't open the file, per se, but rather loads it and dumps the contents into the object.

I believe the problem with your fOpen is that that routine tries to open the file as it is, whereas my routine will actually execute the file and slap the results into your variable. Yours would work fine on a static page, but it's not going to ever work where the server needs to process something. (At least that's the way I imagine it).

The KEY is in these lines:

Set objXMLHttp = Server.CreateObject ("Microsoft.XMLHTTP")
objXMLHttp.Open "GET", strPage ,False,"",""
objXMLHttp.Send

GetHTML = objXMLHttp.ResponseText

The rest is just error checking stuff to keep it all clean.

G.