Forum Moderators: open
Does anyone know of a simple way to copy an entire website from one server to another using ASP.NET? I need something that can start at the root of the site and grab everything under it.
I have FTP information for the site I want to copy from, but either I'm misunderstanding the documentation or the built-in classes only let me copy one file at a time.
I could also use a web scraping utility, but I'm hesitant to ask my client to pay for it. If there's a free one anyone knows of, that would be a hugely appreciated alternative.
(And before anyone asks, I'm not doing anything shady. My client has an old site on a hosting service whose WYSIWYG she likes very much. I want to set her up so she can continue using that as a "development server", and import the content to her new site.)
Thanks!
g.
Here's the code for anyone who's interested. It could be improved through the use of regular expressions, but it was kind of quick and dirty.
ArrayList scraped;
protected void Page_Load(object sender, EventArgs e)
{
scraped = new ArrayList();
}
protected void btnSubmit_ServerClick(object sender, EventArgs e)
{
scrapePage("index.html");
results.Visible = true;
}
private void scrapePage(String filename)
{
// keep track of what we've done
scraped.Add(filename);WebClient objWebClient = new WebClient();
UTF8Encoding objUTF8 = new UTF8Encoding();
String page = "";
Byte[] bytes;
try
{
bytes = objWebClient.DownloadData("http://example.com/" + filename);
page = objUTF8.GetString(bytes);
}
catch (Exception ex)
{
return;
}if (filename.IndexOf(".htm") > 0)
{
// make everything lowercase so we can find what we need
String pageCopy = page.ToLower();String[] keys = { "href=", "src=", "src = " };
String[] links = pageCopy.Split(keys, StringSplitOptions.RemoveEmptyEntries);
// get rid of html up to the first link
links[0] = "";for (int i = 1; i < links.Length; i++)
{
int firstSpace = links[i].IndexOf(' ');
if (firstSpace > 0)
{
// shorten these to include only what's before the first space
links[i] = links[i].Substring(0, firstSpace).Replace("\"", "");
}
else
{
// if there are no spaces in the section, this section is junk - erase it
links[i] = "";
}
}// call this function for each non-blank link we've produced
foreach (String link in links)
{
if (link!= "")
{
if (!scraped.Contains(link)) scrapePage(link);
}
}
}try
{
File.WriteAllBytes(MapPath("/") + filename, bytes);
}
catch
{
return;
}
}
For example create these two files and put them in the same directory:
New text file, save as data.ftp:
================================
OPEN ftp.example.com
USER myusername mypassword
LCD C:Inetpub\wwwroot\
CD mywebsitefolder
BINARY
PUT myfile.html
QUIT
New text file, save as transfer.bat
===================================
ftp -n -s:data.ftp
Now run the batch file to ftp the stuff that has been PUT in the data.ftp file.