Welcome to WebmasterWorld Guest from 54.158.195.220

Forum Moderators: bakedjake

Message Too Old, No Replies

Find URLS on web server

How to find URLs that need updating...

     
4:43 pm on Nov 5, 2002 (gmt 0)

New User

10+ Year Member

joined:Oct 29, 2002
posts:7
votes: 0


I inherited a fairly large university site and need a way to locate URLs that are linked from pages anywhere on the site. (When a URL is changed, I find myself just guessing on which pages it might be a link.)

We use the Google University search and I have tried using that - but this is does not seem to be effective for this purpose.

I have tried searching the web for a solution, but no luck.

Thank you to anyone who can help!

5:11 pm on Nov 5, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Aug 23, 2002
posts:224
votes: 0


If you have linux, try this:

[I assume your URL is '/somepath/myurl.html', and your web server root path is '/home/httpdocs']

rgrep -x 'html' -rl '/somepath/myurl.html' /home/httpdocs > yourresultfile.txt

rgrep -x 'htm' -rl '/somepath/myurl.html' /home/httpdocs >> yourresultfile.txt

lazy way.. :)

cminblues

7:17 pm on Nov 5, 2002 (gmt 0)

New User

10+ Year Member

joined:Oct 29, 2002
posts:7
votes: 0


I meant to mention - we have a UNIX server... (I even tried the linux version of the command, but it didn't like "rgrep").

Unfortunately I am not the server admin - I know enough unix to make my job easier, but that is it...

Thanks again.

8:05 pm on Nov 5, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 16, 2001
posts:545
votes: 0


man grep
or
man rgrep

(myself, I don't know what rgrep is...)

If all else fails, you can copy the whole site to your local hard drive and use Windoze' search capabilities to find the old urls, then just update them back in the source.

9:02 pm on Nov 5, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Aug 23, 2002
posts:224
votes: 0


If you don't have rgrep installed, the above command is useless with grep, 'cause of the lack of 'x' feature [-> searching only for files with this extension].

Of course, you can use 'grep' instead of 'rgrep', omitting the 'x' arg, but in this case you'll have a big CPU-RAM-diskI/O expense, because grep will scan ALL the files in the given directory.

This is another, correct, way, with grep:


find /home/httpdocs -name '*.html' > tmp.txt
find /home/httpdocs -name '*.htm' >> tmp.txt
grep -lr '/somepath/myurl.html' `cat tmp.txt` > yourresultfile.txt

Note that you don't need root privileges for doing this.

cminblues

11:19 pm on Nov 5, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 28, 2002
posts:505
votes: 0


Try a
grep /the/link/you/want/to/find `find . -iname '*html'`
to search all *.html pages in the current directory and below.
Works for me ...
Regards,
R.
8:15 pm on Nov 8, 2002 (gmt 0)

New User

10+ Year Member

joined:Oct 29, 2002
posts:7
votes: 0


Thank you for the responses - I did try some of those, and they didn't quite work... today a kind soul at the college wrote a shell script that does what I need it to do (thank goodness we have a computer science program...)
Thanks again!