homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
Find URLS on web server
How to find URLs that need updating...
gibsonRB75




msg:914588
 4:43 pm on Nov 5, 2002 (gmt 0)

I inherited a fairly large university site and need a way to locate URLs that are linked from pages anywhere on the site. (When a URL is changed, I find myself just guessing on which pages it might be a link.)

We use the Google University search and I have tried using that - but this is does not seem to be effective for this purpose.

I have tried searching the web for a solution, but no luck.

Thank you to anyone who can help!

 

cminblues




msg:914589
 5:11 pm on Nov 5, 2002 (gmt 0)

If you have linux, try this:

[I assume your URL is '/somepath/myurl.html', and your web server root path is '/home/httpdocs']

rgrep -x 'html' -rl '/somepath/myurl.html' /home/httpdocs > yourresultfile.txt

rgrep -x 'htm' -rl '/somepath/myurl.html' /home/httpdocs >> yourresultfile.txt

lazy way.. :)

cminblues

gibsonRB75




msg:914590
 7:17 pm on Nov 5, 2002 (gmt 0)

I meant to mention - we have a UNIX server... (I even tried the linux version of the command, but it didn't like "rgrep").

Unfortunately I am not the server admin - I know enough unix to make my job easier, but that is it...

Thanks again.

Slade




msg:914591
 8:05 pm on Nov 5, 2002 (gmt 0)

man grep
or
man rgrep

(myself, I don't know what rgrep is...)

If all else fails, you can copy the whole site to your local hard drive and use Windoze' search capabilities to find the old urls, then just update them back in the source.

cminblues




msg:914592
 9:02 pm on Nov 5, 2002 (gmt 0)

If you don't have rgrep installed, the above command is useless with grep, 'cause of the lack of 'x' feature [-> searching only for files with this extension].

Of course, you can use 'grep' instead of 'rgrep', omitting the 'x' arg, but in this case you'll have a big CPU-RAM-diskI/O expense, because grep will scan ALL the files in the given directory.

This is another, correct, way, with grep:

find /home/httpdocs -name '*.html' > tmp.txt
find /home/httpdocs -name '*.htm' >> tmp.txt
grep -lr '/somepath/myurl.html' `cat tmp.txt` > yourresultfile.txt

Note that you don't need root privileges for doing this.

cminblues

Romeo




msg:914593
 11:19 pm on Nov 5, 2002 (gmt 0)

Try a
grep /the/link/you/want/to/find `find . -iname '*html'`
to search all *.html pages in the current directory and below.
Works for me ...
Regards,
R.

gibsonRB75




msg:914594
 8:15 pm on Nov 8, 2002 (gmt 0)

Thank you for the responses - I did try some of those, and they didn't quite work... today a kind soul at the college wrote a shell script that does what I need it to do (thank goodness we have a computer science program...)
Thanks again!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved