Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

External websites linking to dead pages or 404 content ?

Should we ignore this?



7:17 am on Jun 26, 2010 (gmt 0)

5+ Year Member

via google webmaster tools, we discovered huge number of websites linking to 404 pages and/or wrong links.

We use .html extension for our pages, but this external websites use the correct url but instead of .html use .htm

Should we ignore this problem ?
how can we fix this ?

We have a dynamic website.


6:47 pm on Jun 26, 2010 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Ignore it no, someone comes to your site and gets a 404, **poof** goes one more opportunity.


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^(.+)\.htm$ /$1.html [R=301,L]

Should rewrite any request for .htm to .html, if it doesn't exist, let your 404 display. Windows servers will have an equivalent.


3:22 am on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

It's not only valuable visitors which may go away due to 404 links, but the links may also carry link juice which can help your site position in the search engines. Using a 301 redirect as rocknbil proposes will direct this link juice to rankable pages.


5:52 am on Jun 27, 2010 (gmt 0)

5+ Year Member

Hello rocknbil,
We have this rule in place which helps in proper direction and we do not loose that important visitor.
but many places are unwanted linkings like:



7:03 pm on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Well . . that's different, I'd say.

So those are the links TO your site or coming FROM sites?

If it's TO your site I'd most certainly let those 404, if they don't exist that is what a 404 is . . . for. :-) If those are the sites the link comes FROM, I'd deny the request.

Unless this is some form of attack strategy I'm unaware of, I'd have nothing to offer if that were the case.


7:17 pm on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

To fix both problems, I'd suggest:

RewriteEngine on
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(.+)\.htm$ http://www.example.com/$1.html [R=301,L]

This routine takes the requested URL-path, removes ".htm", adds ".html", and prepends the document root filepath, essentially converting the requested URL-path to a filepath. It then goes and checks to see if that filepath resolves to a physically-existing .html file. If so it does the redirect, and if not, then it does nothing and the request will get a 404 response.

So legitimate requests where the only error is "htm" versus "html" get fixed-up, while the requests for viagra.htm and asdfgh.htm get a 404.

This only works if the .html files exist as physical "static" files. If you are rewriting .html requests to a script-generation script instead, then that script will need to be modified to do essentially the same thing: Check the database to see if a page can be generated after changing ".htm" to ".html" and if so redirect. Otherwise return a 404 response header and a 404 page.

I dumped the <IfModule> container, since its only practical function would be to allow the rule to fail silently if mod_rewrite is not loaded.



6:01 am on Jun 28, 2010 (gmt 0)

5+ Year Member

The question is why did that other site(s) create all those wrong links. It might not have been an accident.

Make sure there is not a hidden website embedded within your site, that is serving up content, or a 404 page with a link to content (such as pages selling drugs), in response to the 404's that are caused by someone following those bad links.

In other words, if someone placed lots of links around saying that your site has pages selling drugs, it is possible that your site is indeed serving pages like that, which are served only under specific circumstances.

One way to check is to go to those pages and follow the links to your site. If you do that, use high browser and PC security while you do it.

Featured Threads

Hot Threads This Week

Hot Threads This Month