homepage Welcome to WebmasterWorld Guest from 54.237.99.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
Find URLS on web server
How to find URLs that need updating...
gibsonRB75

10+ Year Member



 
Msg#: 314 posted 4:43 pm on Nov 5, 2002 (gmt 0)

I inherited a fairly large university site and need a way to locate URLs that are linked from pages anywhere on the site. (When a URL is changed, I find myself just guessing on which pages it might be a link.)

We use the Google University search and I have tried using that - but this is does not seem to be effective for this purpose.

I have tried searching the web for a solution, but no luck.

Thank you to anyone who can help!

 

cminblues

10+ Year Member



 
Msg#: 314 posted 5:11 pm on Nov 5, 2002 (gmt 0)

If you have linux, try this:

[I assume your URL is '/somepath/myurl.html', and your web server root path is '/home/httpdocs']

rgrep -x 'html' -rl '/somepath/myurl.html' /home/httpdocs > yourresultfile.txt

rgrep -x 'htm' -rl '/somepath/myurl.html' /home/httpdocs >> yourresultfile.txt

lazy way.. :)

cminblues

gibsonRB75

10+ Year Member



 
Msg#: 314 posted 7:17 pm on Nov 5, 2002 (gmt 0)

I meant to mention - we have a UNIX server... (I even tried the linux version of the command, but it didn't like "rgrep").

Unfortunately I am not the server admin - I know enough unix to make my job easier, but that is it...

Thanks again.

Slade

10+ Year Member



 
Msg#: 314 posted 8:05 pm on Nov 5, 2002 (gmt 0)

man grep
or
man rgrep

(myself, I don't know what rgrep is...)

If all else fails, you can copy the whole site to your local hard drive and use Windoze' search capabilities to find the old urls, then just update them back in the source.

cminblues

10+ Year Member



 
Msg#: 314 posted 9:02 pm on Nov 5, 2002 (gmt 0)

If you don't have rgrep installed, the above command is useless with grep, 'cause of the lack of 'x' feature [-> searching only for files with this extension].

Of course, you can use 'grep' instead of 'rgrep', omitting the 'x' arg, but in this case you'll have a big CPU-RAM-diskI/O expense, because grep will scan ALL the files in the given directory.

This is another, correct, way, with grep:

find /home/httpdocs -name '*.html' > tmp.txt
find /home/httpdocs -name '*.htm' >> tmp.txt
grep -lr '/somepath/myurl.html' `cat tmp.txt` > yourresultfile.txt

Note that you don't need root privileges for doing this.

cminblues

Romeo

10+ Year Member



 
Msg#: 314 posted 11:19 pm on Nov 5, 2002 (gmt 0)

Try a
grep /the/link/you/want/to/find `find . -iname '*html'`
to search all *.html pages in the current directory and below.
Works for me ...
Regards,
R.

gibsonRB75

10+ Year Member



 
Msg#: 314 posted 8:15 pm on Nov 8, 2002 (gmt 0)

Thank you for the responses - I did try some of those, and they didn't quite work... today a kind soul at the college wrote a shell script that does what I need it to do (thank goodness we have a computer science program...)
Thanks again!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved