Forum Moderators: phranque

Message Too Old, No Replies

Mirroring a Website

         

Shawn Steele

6:40 pm on Jan 24, 2004 (gmt 0)

10+ Year Member



Hi,

Can someone tell me how I can dynamically copy the folder of www.mainsite.com/little over to www.little.com, and always look for fresh content from www.mainsite.com/little?

Thanks,
Shawn

claus

6:54 pm on Jan 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you run Apache (Your choice of forum indicates that you do) it's quite easy using mod_rewrite.

Look for "Dynamic Mirror" in Engelschall's rewrite guide:
[engelschall.com...]

Shawn Steele

7:17 pm on Jan 24, 2004 (gmt 0)

10+ Year Member



So by the definition, it says to:

_____________________
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^hotsheet/(.*)$ [tstimpreso.com...] [P]

RewriteEngine on
RewriteBase /~quux/
RewriteRule ^usa-news\.html$ [quux-corp.com...] [P]

______________________

So, pardon my stupidity, but would I do do something like this:

_____________________
RewriteEngine on
RewriteBase /~little/
RewriteRule ^little/(.*)$ [my-main-site.com...] [P]

RewriteEngine on
RewriteBase /~little/
RewriteRule ^little\$ [little-site.com...] [P]
______________________

It's probably way off, but I have no idea what I should or should not change nor what server this should be changed in the hta access.

Thanks

claus

7:37 pm on Jan 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, as i understand you, you want the entire "little-site.com/" to be a mirror of "mainsite.com/little/" - in this case you should use the upper example. You will not need the rewrite base as it's the whole domain on "little-site.com" you want to be a mirror and not a sub-folder.

I think this will do it. Put it in the root .htaccess file of "little-site.com"

---------------------------------------------

RewriteEngine on
RewriteRule ^(.*)$ http //www.mainsite.com/little/$1 [P]

---------------------------------------------

(make sure you insert a colon (":") between "http" and "//" above - i've omitted it as otherwise it would become a link. There should be no spaces between "http" and "$1".)

Shawn Steele

7:50 pm on Jan 24, 2004 (gmt 0)

10+ Year Member



Ok,

I put the following in the root hta file of little-site.com and it either gave me a index of files on that server or it said page not found.


RewriteEngine on
RewriteRule ^(.*)$http://www.main-site.com/little/$1 [P]

Did I do something wrong?

Thanks,
Shawn

claus

7:55 pm on Jan 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the above example you should have a space before the http.

If you saw a file listing from the other site it seems it worked - the reason that you saw a file listing might be that that other site does not have an "index.html" file in the "/little/" folder... is that so?

Edit: I misread your post - the missing space between the first dollar sign and the "http" is probably the reason for the error - i just tested a similar rewrite rule and it worked nicely for me.

Shawn Steele

8:06 pm on Jan 24, 2004 (gmt 0)

10+ Year Member



No the file listing was from the little-site.com server, and i tried a space before the http and it came up as page not found. I erased all the files in the little-site.com server so i could tell which server it was going to pull from, and there is a full file site in the main-site.com/little with an hta acess file with a DirIndex to index.htm. Any ideas?

claus

8:46 pm on Jan 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, these things are hard to troubleshoot as web servers can be configured in so many ways. I suppose you have checked for spelling already?

Try adding "Options +FollowSymLinks" - You could make a test with a name brand site like Yahoo, Google, MSN or whatever to see if it will work before you try it on the other site:

------------------------------------

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^(.*)$ http://www.(some domain).com/$1 [P]

------------------------------------

I'm not sure the added "symlinks" will make a difference, as i would suppose that the server would give you a "500 Internal Server Error" in stead of a "404 Page not found" if you were missing it, but try anyway.

Other error sources could be:
Did you create and store the ".htaccess" file in ASCII text format?
Are you sure you did not give any other name than ".htaccess"?
Did you upload the file as ASCII?
Did you upload to the root folder?
Did you write a file name after "little-site.com/" in the browser address bar (eg. "little-site.com/blue.html") that does not exist in the "main-site.com/little/" folder?

I hope some of this will help - if it's not any of this i really don't have a clue. It works nicely for me, exactly as i wrote it in msg #4.

Shawn Steele

8:39 am on Jan 25, 2004 (gmt 0)

10+ Year Member



Ok, I tried everything and nothing seemed to work, so I tried to ssh and symlink the sites by using the following code:

ln -s /home/mainsite/public_html/little /home/little/public_html

and now it just lists a dir of files and one of them are 'little' but when I click on it, it says forbidden. Why is this so hard!

claus

2:35 pm on Jan 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately I really don't know what's up in this case - it must be something in your server configuration that disallows the use of the Throughput Proxy flag ("[P]") - perhaps others have a clue?

jamie

7:37 pm on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi shawn,

this is a different approach, but you could use wget and rsync - run regularly as a cronjob.

wget is like an ftp client which follows all the links you tell it to and downloads the pages and images to where you specify - this can include complete directory hierarchies, html, images, the lot.

if like me you have other files which aren't directly linked to - e.g. wget can't follow a javascript popup window, then you could run rsync on these and copy those over to the new server. rsync is a mirroring tool, mostly used for backups.

the beauty about both programmes is that they only copy over newer files - so once you have the bulk copied over, there will be little extra copying to do on a nightly bases.

both programmes should be run on a cron job (sorry i dont' know about windows IIS options).

both programmes have LOTS of options, but a good starting place would be wget official site [gnu.org] and this good rsync tutorial [mikerubel.org]

i have only just started doing this myself, so i'm afraid i can't help much with the exact config, but there will be many others here who can if you need it :-)

good luck