Forum Moderators: coopster

Message Too Old, No Replies

Validate URL

Simple function won't work

         

Birdman

9:27 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello, this is driving me crazy. I'm just trying to make sure that an url exists. Here is the code:


function valid_url($str)
{
if (@fopen($str,"r"))
return 1;
else
return 0;
}
if (!valid_url($url)){
$error .= "<li>The URL, <em>$url</em>, does not exist.</li>";
}

I have tried using file(), as well. Any help is appreciated!

Birdman

Birdman

9:34 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you won't believe this! It is because of Verisign's new redirect policy. It's returning a page, no matter what url.

bcolflesh

9:35 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is allow_url_fopen enabled on your server?

If you have PEAR, it has a builtin function for this purpose:

dickmann.homeunix.org/pear/phpdoc/PEAR/Validate/0.1.1/Validate/Validate.html#url

bcolflesh

9:38 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you won't believe this! It is because of Verisign's new redirect policy. It's returning a page, no matter what url.

Aaah - another annoying consequence of their actions - you'll have to return the server headers and parse them.

Birdman

9:41 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bcolflesh, allow_url_fopen is set to 1. I'm not sure if that is correct, but I know it was working before. I'm sure it Verisign that is causing it because I ran a separate test with file(), and it returned the Verisign page(with a bad URL).

Birdman

9:43 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, I was thinking about searching for a common string in their page and using if(!strpos()), or something like that.

Can you give me an example of what you are suggesting?

Birdman

bcolflesh

9:47 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're right - until they back down, the server header seems to remain constant for bad requests:

HTTP/1.1 302 Moved Temporarily
Date: Tue, 23 Sep 2003 21:44:38 GMT
Location: [sitefinder.verisign.com...]
Content-Type: text/plain
Transfer-Encoding: Chunked
Connection: Close

You can parse the Location line and assume(?) a bad URL.

Birdman

9:51 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Appreciate it!

I wonder what this will mean to the engines...smells like trouble.

Birdman

bcolflesh

9:52 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use fsockopen:
us3.php.net/manual/en/function.fsockopen.php

Then fgets:
us3.php.net/manual/en/function.fgets.php

Examples at the bottom of each page.

Birdman

7:39 pm on Sep 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just for the record, this is how I fixed it:


function valid_url($str)
{
if($test = implode("\r",file($str))){
if (strpos($test,"<title>VeriSign ¦")){
return 0;
} else {
return 1;
}
} else {
return 0;
}
}

If anyone has a more ellegant solution, I'm all ears ;)

Birdman

ps. That's the beauty of dynamic pages...predictability!

bcolflesh

8:05 pm on Sep 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's cool as long as you don't anticipate any pages that match that title, like:

newsfactor.com/perl/story/7866.html

Birdman

8:21 pm on Sep 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I couldn't figure out how to get remote file headers, so I went that way. Could you give me an example of parsing the headers?

bcolflesh

10:23 pm on Sep 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's an example to retrieve them:

<?php
$fp = fsockopen("www.heruhguuih.com", 80, $errno, $errstr, 30);
fputs($fp, "GET / HTTP/1.1\r\nHost: www.heruhguuih.com\r\n\r\n");
while (!feof($fp)) {
echo nl2br(fgets($fp, 1028));
}
fclose($fp);
?>

wkitty42

10:52 pm on Sep 24, 2003 (gmt 0)

10+ Year Member



bcolflesh,

wouldn't HEAD instead of GET be a better option to retrieve just the headers?

bcolflesh

11:38 pm on Sep 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HEAD won't retrieve Location in this case.

wkitty42

12:51 am on Sep 25, 2003 (gmt 0)

10+ Year Member



ahhh... and that's what we specifically need to check... gottcha...

daisho

1:15 am on Sep 25, 2003 (gmt 0)

10+ Year Member



you can use cURL if you have it.

Timotheos

9:23 pm on Sep 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So Birdman did you make the news [apnews.myway.com]?

Birdman

11:09 am on Sep 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



:) No, I never got interviewed, but I feel mr. Fitzpatrick's pain. I spent a good hour or two trying to figure out why something that worked was suddenly broke.

By the way, thanks for the examples. I haven't tried them yet, but will.

Birdman