Forum Moderators: coopster

Message Too Old, No Replies

Validating client-supplied URLs

I'm having trouble with my host intercepting(?) HTTP Status codes

         

KBS Gladiator

8:36 pm on Jul 10, 2006 (gmt 0)

10+ Year Member



First off, I apologize if this is an issue that has been discussed elsewhere in these forums. But, I tried multiple searches on the topic and came up blank.

O.k., here's what I'm trying to do:

We have a form and the client supplies a URL, among other things. The URL is really the point of the whole form, so I want to do some validation on the submitted URL to ensure that its 1) syntactically correct and then 2) really exists. 1) is not a problem - easily done.

What I've tried for 2):

multiple PHP scripts to retrieve the header information returned by a request for the URL,etc. I've even now tried CURL/PHP techniques.

The problem:

When the ( syntactically correct) URL clearly does not exist (e.g. pasting it into the address of a browse returns correct 4xx or 5xx HTTP Status codes - e.g. Server not Found, Document not found, etc), the host of my website is returning a "200 OK" (or sometimes some 3xx codes followed by 200 OK) pointing to some error splash page : "Document/Server not found". This is precisely what I don't want - i want the 4xx or 5xx Status code from the header of the requested URL so that we can prompt the client to correct their URL submission. I have contacted our host but, as of yet, no solution forthcoming.

I know this is a common problem, but I have yet to find a solution. Anyone have any ideas?

jatar_k

10:02 pm on Jul 10, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



regex should get the first one, though url validation can be ugly, was looking at some examples and they can be nuts

maybe look at this one
[zend.com...]

maybe take a look at fsockopen [php.net] for the second part, the example there looks like it may be just right.

you could look at get_headers [php.net] as well, though the whole URL Functions [php.net] portion of the manual is interesting

KBS Gladiator

10:58 pm on Jul 10, 2006 (gmt 0)

10+ Year Member



Thanks Jatar_k, but this doesn't address the problem I am posting about. I'm way beyond your recommendations, although your posted links will be very useful to readers tackling these issues. My problem is with part 2) does the url exist

I'll repeat guts of the problem (whether I use fsocketopen(), get_headers, curl/php or whatever):

When the ( syntactically correct) URL clearly does not exist (e.g. pasting it into the address of a browse returns correct 4xx or 5xx HTTP Status codes - e.g. Server not Found, Document not found, etc), the host of my website is returning a "200 OK" (or sometimes some 3xx codes followed by 200 OK) pointing to some error splash page : "Document/Server not found". This is precisely what I don't want - i want the 4xx or 5xx Status code from the header of the requested URL so that we can prompt the client to correct their URL submission.

So basically, my host is hijacking some 4xx and 5xx header responses and returning a false 200 OK to my scripts. Usually these are for urls like [thisdoesntexist.com...] (ie no path or document)

jatar_k

11:00 pm on Jul 10, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



then your best option, if you are sure it is your host, is to talk to your host

or get a new one

sorry