Forum Moderators: coopster & phranque

Message Too Old, No Replies

Script to write file

         

toolman

11:01 pm on Jun 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a script that will crawl a directory and then write the url's to a .txt file?

Brett_Tabke

10:08 am on Jun 26, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



So bascially, you are looking for a file directory reader that dumps the paths to a text file? I have something like that some where...

sugarkane

11:01 am on Jun 26, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or do you mean a directory as in ODP?

toolman

4:08 pm on Jun 26, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes BT. Writes the urls to a file.

Xoc

6:27 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache or IIS?

toolman

7:07 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache. What I'm after is getting a text file of urls to every page on a domain so I can feed that into a batch submission routine.

sugarkane

8:13 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use:

#!/usr/bin/perl
use File::Find;

$domain="http://www.foobar.com";
$root_path="/path/to/domain/root/";

find \&foo,$root_path;

open (FP,">output.txt");
foreach $i (@files) {
print FP "$domain/$i\n";
}
close(FP);

exit;

sub foo {
my $foo=$File::Find::name;
if ( $foo=~/html$/) {
$foo=~s/$root_path//;
push @files,$foo;
}
}

(Note - this crawls down the directory tree as well)

toolman

8:35 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks SK...it's not returning every file in a directory though????

sugarkane

8:52 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It should return all html files - you can either hack the regex below or remove the whole 'if' statement if you want it to include other file types.

if ($foo=~/html$/) {
$foo=~s/$root_path//;
push @files,$foo;
}

toolman

8:59 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh I see now. How would I add the extension .HTML (all caps) in there too. I have a bunch of machine generated pages and they are all HTML and not html.

sugarkane

9:21 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Change if ($foo=~/html$/) to if ($foo=~/html$/i)

If you wanted to include, say, .htm files as well you could change it to

if ($foo=~/html$¦htm$/i)

toolman

1:45 am on Jun 29, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks SK and BT...I got it working now. Over 300 urls in one site. We wouldn't want to do that by hand would we?