OMG! 100 links all wrong... - Linux, Unix, and *nix like Operating Systems forum at WebmasterWorld - WebmasterWorld

Forum Moderators: bakedjake

Message Too Old, No Replies

OMG! 100 links all wrong...

find and change a string in multiple files

mikejson

4:01 pm on Sep 5, 2003 (gmt 0)

10+ Year Member

I don't know if this is possible but...

I made a website with about 100 pages(give or take 10) and it's actually a sub site for a department in a large company. So I had some guidelines to follow. Basically I had a header to repeat on every page. Now I got the preleminary header set up from the companies Web Master and it had all the links and everything set up. Now, for some strange reason the WM changed a link from www.mysite.com/page.html to www.mysite.com/about/page.html
(these are examples names, but same situation). Is there a way I could write a script to go through the entire web folder of my account on the unix system and change ANY occurrence of www.mysite.com/page.html to the correct one? This is spread across 3 levels of directories and about 100 files total. There is one "root"(not root of the system) directory which I can start from and everything in is my own, and pages that I need to change.

I would assume I could do something like this from my limited experience of Unix O/S's but I"m not sure how to go about it.

Any starters?( I don't have root log in... but I could hack *cough cough* get it, if need be).

Mohamed_E

5:04 pm on Sep 5, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Who owns the files? If you do, no need for root.

This is just to get you started, in any case your syntax will probably be different because I use the tcsh rather than bash.

As someone posted in response to a vaguely similar query, if you are doing things to files in a hierarchy find is your friend.

A first cut (remember, your syntax will vary) is along the lines of:


foreach $file ( `find starting_directory -name "*.html" -print`)
cp $file $file.sav
sed 's/bad url/good url/' $file.sav > $file
end

Where starting_directory is the top level directory of your html hierarchy. Most modern versions of find will not need the -print (but it does no harm).

Just to start you experimenting, I have not tested this and am in no way responsible for anything should you be unwise enough to implement this ;)

Remember to clean up the file.sav's after you are finished and have checked that everything is OK, another find will happily do it for you.

mikejson

9:40 pm on Sep 5, 2003 (gmt 0)

10+ Year Member

so it is possible to set up a script then.... interesting, anyone know of a good source to read about implementing this on bash? Like I said before, I've never set up a script at all on unix, I have had some experience with it but not enough to do this without reading up on it :)

thanks

chaitan

10:23 pm on Sep 5, 2003 (gmt 0)

10+ Year Member

Now, for some strange reason the WM changed a link from www.mysite.com/page.html to www.mysite.com/about/page.html

one quick way to fix it is add a symbolic link between /page.html and /about/page.html.

# cd [document root]
# mkdir about
# cd about
# ln -s ../page.html page.html

mikejson

1:15 pm on Sep 9, 2003 (gmt 0)

10+ Year Member

This can be done without having root? I don't want to hack the account if I don't have to.

marcs

11:23 pm on Sep 9, 2003 (gmt 0)

10+ Year Member

Symlinks are great for that type of setup, however would duplicat content be a potential issue here?

If search engines can find both pages, they would appear to be dupes.