homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

Find URLS on web server
How to find URLs that need updating...

 4:43 pm on Nov 5, 2002 (gmt 0)

I inherited a fairly large university site and need a way to locate URLs that are linked from pages anywhere on the site. (When a URL is changed, I find myself just guessing on which pages it might be a link.)

We use the Google University search and I have tried using that - but this is does not seem to be effective for this purpose.

I have tried searching the web for a solution, but no luck.

Thank you to anyone who can help!



 5:11 pm on Nov 5, 2002 (gmt 0)

If you have linux, try this:

[I assume your URL is '/somepath/myurl.html', and your web server root path is '/home/httpdocs']

rgrep -x 'html' -rl '/somepath/myurl.html' /home/httpdocs > yourresultfile.txt

rgrep -x 'htm' -rl '/somepath/myurl.html' /home/httpdocs >> yourresultfile.txt

lazy way.. :)



 7:17 pm on Nov 5, 2002 (gmt 0)

I meant to mention - we have a UNIX server... (I even tried the linux version of the command, but it didn't like "rgrep").

Unfortunately I am not the server admin - I know enough unix to make my job easier, but that is it...

Thanks again.


 8:05 pm on Nov 5, 2002 (gmt 0)

man grep
man rgrep

(myself, I don't know what rgrep is...)

If all else fails, you can copy the whole site to your local hard drive and use Windoze' search capabilities to find the old urls, then just update them back in the source.


 9:02 pm on Nov 5, 2002 (gmt 0)

If you don't have rgrep installed, the above command is useless with grep, 'cause of the lack of 'x' feature [-> searching only for files with this extension].

Of course, you can use 'grep' instead of 'rgrep', omitting the 'x' arg, but in this case you'll have a big CPU-RAM-diskI/O expense, because grep will scan ALL the files in the given directory.

This is another, correct, way, with grep:

find /home/httpdocs -name '*.html' > tmp.txt
find /home/httpdocs -name '*.htm' >> tmp.txt
grep -lr '/somepath/myurl.html' `cat tmp.txt` > yourresultfile.txt

Note that you don't need root privileges for doing this.



 11:19 pm on Nov 5, 2002 (gmt 0)

Try a
grep /the/link/you/want/to/find `find . -iname '*html'`
to search all *.html pages in the current directory and below.
Works for me ...


 8:15 pm on Nov 8, 2002 (gmt 0)

Thank you for the responses - I did try some of those, and they didn't quite work... today a kind soul at the college wrote a shell script that does what I need it to do (thank goodness we have a computer science program...)
Thanks again!

Global Options:
 top home search open messages active posts  

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved