Forum Moderators: open

Message Too Old, No Replies

Problems changing from a htm to html layout

Redisgned site changed with fresh bot but reverted to old site after update

         

cheemo

2:40 pm on Nov 4, 2002 (gmt 0)

10+ Year Member



My old site was listed as [example.com...] and all the pages within the site were .htm. I spent a couple months creating a new site and changed all the extensions to .html

As soon as I uploaded the new site about 2 weeks ago google listed my new index page (.html) and also everything that was 1 level deep in the serps but after the update all the .html pages are gone and only the old .htm pages exist.

I didn't want to delete the old pages because they still bring in traffic and I was scared that if I deleted them and added the new pages my site might dissapear completely. Obviously this wasn't the right move.

What should I do? Thanks.

Longhaired Genius

3:48 pm on Nov 4, 2002 (gmt 0)

10+ Year Member



Now you've rebuilt your site you should keep it neat and tidy. Delete the old .htm pages and set up a custom error-page to direct your visitors to where you want them to go.

Full instructions can be found here:

[developers.evrsoft.com...]

andreasfriedrich

4:19 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don´t think the 404 approach to be that good.

You should send a 404 code if the resource pointed to by a certain URI no longer exists on your site. If, however, the resource still exists (whether it has been modified or renamed does not matter), you should tell UAs that the resource can now be found at the new URI by sending a 301 status code.

Imagine a site about three celebreties. After restructuring your website you only keep the resources about celeb#1 and celeb#2. You would return the following status codes:

resource ¦ old site ¦ new site ¦ status 
celeb #1 ¦ acar.htm ¦ aca.html ¦ 301
celeb #2 ¦ than.htm ¦ tha.html ¦ 301
celeb #3 ¦ jmcc.htm ¦ ...x.... ¦ 404
This way you keep the PR of acar.htm and than.htm. UAs following a link to those pages will be redirected to the new URI. But even more important you don´t put two totally unknown resources into the web. Instead you tell concerned parties that the resources formerly known as acar.htm and than.htm are now known as aca.html and tha.html respectively.

Andreas

john316

4:39 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This should work.
add it to your htaccess file.

As long as all of your file names are the same and you just changed the extension, you should avoid any further issues and you can delete the old stuff.

RedirectMatch permanent ^(.*)\.html$ $1.htm

p.s....topic should be moved somewhere else. not google news..I think.

cheemo

6:24 pm on Nov 4, 2002 (gmt 0)

10+ Year Member



Thanks for the responses, and actually I posted in google because the only reason I am concerned with this issue is because of the problems I am having with google adjusting to the change.

Also as ridiculous as it is this site is on a very budget host (yet very reliable and had met my needs) and I can't even set up custom 404 pages on the server.

I am trying to find a new host but it is really hard to determine which hosts are reliable with all the junk out there (sticky me if you have any recommendations). In the mean time would it be a bad idea to use a javascript redirect or something like that?

cheemo

9:36 pm on Nov 4, 2002 (gmt 0)

10+ Year Member



Here is my big question to the google gurus:

My site used to be [example.com...] and I changed it to [example.com...] and I figured since browsers default to index.html before it tries index.htm when requesting a url the google spider would also. I figured it would hit the index.html and crawl from there why wouldn't that be the case?

Could it be that the freshbot did just that and then during the update the google bot used info it dug up before I updated my site, and perhaps during the next update it will have my new site? Just really worried about being delisted here.

bobriggs

10:00 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



and I figured since browsers default to index.html before it tries index.htm when requesting a url the google spider would also.

No, -No.

The page could be default.htm(l), home.htm(l), anything.cgi, etc. - it's set up by the server software, and if you're on apache, you can change it in .htaccess also.

Unless your external (and internal links) actually specify index.htm or index.html or something else, the browser, or googlebot for that matter, will request / (as in GET /)

It's up to the server to decide which page to serve up from there.

andreasfriedrich

10:28 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I figured it would hit the index.html and crawl from there why wouldn't that be the case?

A robot is just some stupid kind of software.

While you may know that a certain page exists the robot does not.

While you may remember the single pages of your website by theme the robot does not. You will therefor think of a page as the same thing although you changed its name. The robot will not. It will only see the new URI.

Think of it like this: When you think about friends, family and close neighbors you picture them in your mind. Bigger communities refer to them by their name and index them by name, social security number, etc.

A robot knows a resource only by name (URI). If you change the name, the thing the name points to is a new thing to the robot unless it uses some intelligent algo to try to determine whether it is the same thing under a different name.

Andreas

WebGuerrilla

12:18 am on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




What you need to do is to set up 301 redirects for all the .htm files so that they will redirect to the .html files.

Since Google has already indexed the site, it will return and keep requesting the .htm files since those are the pages that are already in the db.

It will also pick up the new .html when it grabs your home page, but when all the numbers are crunched, the .html will more than likely be dropped because they are exact duplicates of the pages already in Google's db.

You need to tell Googlebot that you've changed the file names. A 301 will accomplish that.