homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
External websites linking to dead pages or 404 content ?
Should we ignore this?
Future




msg:4159732
 7:17 am on Jun 26, 2010 (gmt 0)

Recently,
via google webmaster tools, we discovered huge number of websites linking to 404 pages and/or wrong links.

We use .html extension for our pages, but this external websites use the correct url but instead of .html use .htm

Should we ignore this problem ?
and/or
how can we fix this ?

We have a dynamic website.

 

rocknbil




msg:4159914
 6:47 pm on Jun 26, 2010 (gmt 0)

Ignore it no, someone comes to your site and gets a 404, **poof** goes one more opportunity.

Apache?

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^(.+)\.htm$ /$1.html [R=301,L]
</IfModule>

Should rewrite any request for .htm to .html, if it doesn't exist, let your 404 display. Windows servers will have an equivalent.

lammert




msg:4160038
 3:22 am on Jun 27, 2010 (gmt 0)

It's not only valuable visitors which may go away due to 404 links, but the links may also carry link juice which can help your site position in the search engines. Using a 301 redirect as rocknbil proposes will direct this link juice to rankable pages.

Future




msg:4160067
 5:52 am on Jun 27, 2010 (gmt 0)

Hello rocknbil,
We have this rule in place which helps in proper direction and we do not loose that important visitor.
but many places are unwanted linkings like:

example.com/adfasdfasdfsadf/
example.com/cheap-viagra/
etc.

rocknbil




msg:4160313
 7:03 pm on Jun 27, 2010 (gmt 0)

Well . . that's different, I'd say.

So those are the links TO your site or coming FROM sites?

If it's TO your site I'd most certainly let those 404, if they don't exist that is what a 404 is . . . for. :-) If those are the sites the link comes FROM, I'd deny the request.

Unless this is some form of attack strategy I'm unaware of, I'd have nothing to offer if that were the case.

jdMorgan




msg:4160319
 7:17 pm on Jun 27, 2010 (gmt 0)

To fix both problems, I'd suggest:

RewriteEngine on
#
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(.+)\.htm$ http://www.example.com/$1.html [R=301,L]

This routine takes the requested URL-path, removes ".htm", adds ".html", and prepends the document root filepath, essentially converting the requested URL-path to a filepath. It then goes and checks to see if that filepath resolves to a physically-existing .html file. If so it does the redirect, and if not, then it does nothing and the request will get a 404 response.

So legitimate requests where the only error is "htm" versus "html" get fixed-up, while the requests for viagra.htm and asdfgh.htm get a 404.

This only works if the .html files exist as physical "static" files. If you are rewriting .html requests to a script-generation script instead, then that script will need to be modified to do essentially the same thing: Check the database to see if a page can be generated after changing ".htm" to ".html" and if so redirect. Otherwise return a 404 response header and a 404 page.

I dumped the <IfModule> container, since its only practical function would be to allow the rule to fail silently if mod_rewrite is not loaded.

Jim

SteveWh




msg:4160473
 6:01 am on Jun 28, 2010 (gmt 0)

The question is why did that other site(s) create all those wrong links. It might not have been an accident.

Make sure there is not a hidden website embedded within your site, that is serving up content, or a 404 page with a link to content (such as pages selling drugs), in response to the 404's that are caused by someone following those bad links.

In other words, if someone placed lots of links around saying that your site has pages selling drugs, it is possible that your site is indeed serving pages like that, which are served only under specific circumstances.

One way to check is to go to those pages and follow the links to your site. If you do that, use high browser and PC security while you do it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved