homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Perl Code: I want to only spider html pages with my site search
but how?
chopin2256




msg:437415
 10:01 pm on Nov 23, 2005 (gmt 0)

I am using a site search that spiders my site. However, the spider indexes ALL webpages. The spider has a perl configuration that it will follow. I got the perl coding to only index html and no named extensions. Here is an example of a working perl script:


test_url => sub {
my $url = shift;
return 1 if $url->path =~ /\.html?$/; # .html or .htm
# any files that have a dot are not html:
return 1 unless $url->path =~ /\./;
return 0;
},

Great, it works! Also, I can index "non extension" files by using this code:


test_url => sub {
my $url = shift;
# any files that have a dot are not html:
return 1 unless $url->path =~ /\./;
return 0;
},

However, for some reason, I can't get the script to only spider "html" files only. I tried this:


test_url => sub {
my $url = shift;
return 1 if $url->path =~ /\.html?$/; # .html or .htm
return 0;
},

But the spider indexes ALL garbage pages when I use this script. Can anyone help me figure out what is wrong with the perl code above? I would like to use a perl script to spider ONLY html files? The logical methods aren't working for some reason.

 

wruppert




msg:437416
 2:34 am on Nov 24, 2005 (gmt 0)

That is not a working script, it is a snippet from one. The pattern match looks fine, you probably need to look at the rest of the program.

chopin2256




msg:437417
 2:43 am on Nov 24, 2005 (gmt 0)

I swear to you, it doesn't work.

Something is not right with the snippet of code.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved