Welcome to WebmasterWorld Guest from 54.144.126.195

Message Too Old, No Replies

Increase in not found errors message in webmaster tools

     

helenp

10:44 am on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi, my site has fallen (english section) and in the last 14 days I got this message twice:
http://www.example.com/: Increase in not found errors

My site is 10 years old, and often I delete pages, and I dont give any special error as this happens often,
so now I have thousands of not found errors, and many pages are years old, and Goggle has still not dropped them. What should I do with them?
Can this affect my ranking?
Thanks
.

[edited by: Robert_Charlton at 11:20 am (utc) on Jan 8, 2013]
[edit reason] examplified domain [/edit]

helenp

11:53 am on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In GWT, on the "crawl error" page you can select each error and by type of error. When you check the links by clicking them, what additional information does it give you about where the links are "Linked From" (that's the name of the tab)?

It's likely the errors are links created by crawl mistakes but it would be interesting to know if they are all from your site or if there are any external links creating them.

This I explained in post just before this post of yours, they are all mine, however as I said, maybe there could have been an external site before

Str82u

11:55 am on Jan 9, 2013 (gmt 0)



@g1smd - that made me smile - you must run a tight ship when at the helm.

It's not to say that anyone needs to not use pages named "index" but not naming it in links has advantages for A/B testing and security.

@helenp - understood, sorry about not catching that in the page sooner.

helenp

12:07 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should do: href="/svenska/"

The leading slash is required.

The link should not mention the index file filename.

Your DirectoryIndex directive should take care of delivering the correct content.


Ever heard of directoryindex, been checking a bit, so if I get you right I should tell apache to not serve /index.htm but just /?

Also what is best absolute or relative pages fixing the / problem.
I can easily change these links /sales/index.htm using find and replas, however Im afraid dreamweaver will keep doing that way and I will forget to change it.

g1smd

12:36 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



DirectoryIndex index.htm
ensures that when a user requests
example.com/folder/
- the canonical URL for a folder or for the index page in a folder - the server uses the content of the index.htm file in that folder to fulfill that request, without telling the user what that file is actually called.

URLs are a reference system used "out there" on the web. Filenames are a reference system used "here" inside the server. The two are not at all the same thing, merely related by the actions of the server software and how it is configured.

The
DirectorySlash
directive is also relevant. When user requests
example.com/folder
for folder name that exists, the server will send a redirect telling the browser to make a new request for
example.com/folder/
instead.

The site should link to
href="/folder/"
each time.

Make sure you install a local copy of Apache so you can fully test your site as http ://localhost/ or http ://127.0.0.1/ too.

helenp

1:34 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




DirectoryIndex index.htm ensures that when a user requests example.com/folder/ - the canonical URL for a folder or for the index page in a folder - the server uses the content of the index.htm file in that folder to fulfill that request, without telling the user what that file is actually called.

The DirectorySlash directive is also relevant. When user requests example.com/folder for folder name that exists, the server will send a redirect telling the browser to make a new request for example.com/folder/ instead.

Thanks
Thats the server already do,
I have tested mysite/espanol
and mysite espanol/
and I get my site with/ at the end and without mentioning index.htm
However if I go to mysite homepage from an inner page, then I get index.htm though to that I have that in the link

DodgeThis

1:51 pm on Jan 9, 2013 (gmt 0)



Change your Home links to href="/".

g1smd

1:54 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Once all the internal links point to the correct URLs, the next step would be to redirect requests for index URLs to the URL ending with a slash.

If you're not able to do that for a while, as an intermediate step, adding the correct
rel="canonical"
tag to each index page would be a good idea.

Tight ship is the only way to do things. :) Never let search engines guess what you meant.

Str82u

6:49 pm on Jan 9, 2013 (gmt 0)



@helenp
Thats the server already do,
I have tested mysite/espanol
and mysite espanol/
and I get my site with/ at the end and without mentioning index.htm
Correct - By default a server is usually set up to look for pages named "home", default" and "index" when a user clicks into a directory including your home directory (public_html).

As g1smd mentions, You'll want to make sure you use the trailing slashes "/" to avoid server redirects; the server will add it anytime you leave it off of a "/folder/directory" link to push the user inside the "/folder/directory/" - then it looks for the default index page to serve content.

Later on, if you feel fancy or paranoid, you can use DirectoryIndex in htaccess to tell the server that the index pages have a different name, that you name, like "/folder/name/example.html" could be the index page if you used
DirectoryIndex example.html
in the htaccess file in the folder named "/name/"

helenp

8:12 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks all, I will start to work on this as soon as I finished to implement paypal, puf, nearly there :)
hopefully tomorrow.

So do I understand you right if I add this to htaccss
DirectoryIndex index.html
that will take you to the index page on root and on folders index page?
mysite.com/
mysite.com/swedish/

I will have to have a good look at this and I will come back and also see if some changes in webmastertool.

Str82u

8:32 pm on Jan 9, 2013 (gmt 0)



So do I understand you right if I add this to htaccss
DirectoryIndex index.html
that will take you to the index page on root and on folders index page?
mysite.com/
mysite.com/swedish/
That is correct but you do not need to do this because your server is already doing it by default... there's no reason for you to go the effort or add new files unless you plan on using something much different for the name of the index page to be served to users. This works per folder as well.
DirectoryIndex widgets.html
if that were in the htaccess for the folder/directory "mysite.com/swedish/" you are telling the server to show the page named "widgets.html" to users as the default index page even if there is a file in the same directory named "index.html".

helenp

8:50 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is correct but you do not need to do this because your server is already doing it by default... there's no reason for you to go the effort or add new files unless you plan on using something much different for the name of the index page to be served to users. This works per folder as well.


Oh yes I know, but I meant as a redirection, if I write mysite.com/index.htm or clic on a link with that url that it would redirect me to mysite.com/
Ok then, thanks, sounded to easy lol

lucy24

10:48 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You'll want to make sure you use the trailing slashes "/" to avoid server redirects; the server will add it anytime you leave it off of a "/folder/directory" link to push the user inside the "/folder/directory/" - then it looks for the default index page to serve content.

To split hairs: The server* will issue this redirect (yes, it is a full-blown 301 redirect, whereas "index.whatever" is a rewrite) IF and ONLY IF mod_dir is at the default "DirectorySlash On" setting.


* Apache 2.x if you want to split even more hairs.

g1smd

11:36 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



DirectoryIndex finds the right internal index file when URL with trailing slash is requested.

When URL with index filename included is requested the server will serve the index file as duplicate content unless you add a RewriteRule that tells the browser to make a new request for the right URL - the one ending with a slash.

lucy24

1:03 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



DirectoryIndex finds the right internal index file when URL with trailing slash is requested.

Where "finds the right internal index file" is short for:

... searches the list of names given in the DirectoryIndex directive, checks them against the contents of the requested directory, and uses the first one it finds.

;)

And if it doesn't find one, it proceeds to Option B: check whether auto-indexing (Indexes option) is enabled for the current directory, either explicitly or by inheritance.

Details like this can matter if you're on shared hosting, because the host may or may not divulge what the config-file settings are.

helenp

7:09 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When URL with index filename included is requested the server will serve the index file as duplicate
content unless you add a RewriteRule that tells the browser to make a new request for the right URL -
the one ending with a slash.


This a kind member gave me by private mail and I think its what you are saying

# REDIRECT ROOT htm INDEX PAGE
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm\ HTTP/
RewriteRule ^index\.htm$ [mysite.com] [R=301,L]

Would this apply to all my index files?

helenp

7:18 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I dont feel comfortable, the 404 error pages has increased with some hundreds.

Looks like all 404 errors starts with one of these patterns:
hacienda_nagueles_apartments_marbella.htm/sales/
hacienda_nagueles_apartments_marbella.htm/svenska/
hacienda_nagueles_apartments_marbella.htm/espanol/
hacienda_nagueles_apartments_marbella.htm/maps/

g1smd

8:19 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



As the code comment says, it applies only to the root index.

# REDIRECT ROOT htm INDEX PAGE


The one you need has been posted multiple times in the Apache forum so far this year.

Looks like all 404 errors starts with one of these patterns:
hacienda_nagueles_apartments_marbella.htm/sales/
hacienda_nagueles_apartments_marbella.htm/svenska/
hacienda_nagueles_apartments_marbella.htm/espanol/
hacienda_nagueles_apartments_marbella.htm/maps/


That usually happens when
hacienda_nagueles_apartments_marbella.htm
links out to
href="folder/"
instead of
href="/folder/"
.

helenp

9:08 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That usually happens when hacienda_nagueles_apartments_marbella.htm links out to href="folder/" instead of href="/folder/".


Thanks,

[edited by: helenp at 10:07 am (utc) on Jan 10, 2013]

helenp

9:36 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I found the apache code thanks and it works like a charm on http for all indexfiles,
suppose for https I would have to add a similar one for as it does not work on https

AddType application/x-httpd-php5 .htm .html
RewriteEngine On
# REDIRECT ROOT htm INDEX PAGE
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.mysite.com/$1? [R=301,L]

.

[edited by: Robert_Charlton at 10:03 am (utc) on Jan 10, 2013]
[edit reason] delinked linked example [/edit]

g1smd

10:13 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Don't forget to alter the code comment to say "redirect index request in folders and root".


Don't forget to add the standard non-www/www redirect code after the index redirect.

That code is probably in the same thread as the index redirect code you quoted.

helenp

10:33 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Don't forget to alter the code comment to say "redirect index request in folders and root".

I done thanks, to quick posting


Don't forget to add the standard non-www/www redirect code after the index redirect.

I already had that before thanks.

lucy24

11:07 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



[^\ ]* seems a complicated way of saying \S* -- especially when what you really mean is just l? (ell, question mark)

html?
=
htm|html

Besides, the Rule itself simply says htm$ with no "l" option. In this case the Rule and Condition need to match.

g1smd

11:19 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



[^\ ]*
makes space for any and every query string parameter of any length - and it gets stripped in the redirect.

If extensions other than
.htm
are supposed to redirect, then replace the '
\.htm
' bit with '
\.html
', '
\.php
', '
\.html?
', '
\.(html?|php)
', '
\.(html?|php[345]?)
' or whatever you need in both places.

helenp

11:48 am on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No idea but it works, and I added an equal below for https and bot works perfect.
Thanks

g1smd

12:19 pm on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you have rules for both http and https in the same file that potentially cover the same path requests, you'll need to add extra conditions to both rulesets otherwise one will run for every request and the other will never run.

helenp

12:33 pm on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have rules for both http and https in the same file that potentially cover the same path requests, you'll need to add extra conditions to both rulesets otherwise one will run for every request and the other will never run.

I thought it wouldnt be that easy lol,
however I tested it, went from http to https and contrary and as far as I seen it worked perfect.

I found this:
RewriteCond %{HTTPS} =on
RewriteRule ^(.+)$ - [env=ps:https]
RewriteCond %{HTTPS} !=on
RewriteRule ^(.+)$ - [env=ps:http]

# redirect urls with index.html to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)index.html HTTP/ [NC]
RewriteRule ^.*$ %{ENV:ps}://%{SERVER_NAME}/%1 [R=301,L]

# change // to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)//(.*) HTTP/ [NC]
RewriteRule ^.*$ %{ENV:ps}://%{SERVER_NAME}/%1/%2 [R=301,L]

And this from webmasterword - jdmorgan:
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.+)$ - [env=askapache:%2]

# redirect urls with index.html to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /([^/]+/)*index.html HTTP/
RewriteRule ^(([^/]+/)*)index.html$ http%{ENV:askapache}://%{HTTP_HOST}/$1 [R=301,L]

# change // to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)//(.*) HTTP/ [NC]
RewriteRule ^.*$ http%{ENV:askapache}://%{HTTP_HOST}/%1/%2 [R=301,L]

g1smd

1:23 pm on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There are a lot of problems with that code.

You can't just cut and paste chunks of code and hope they all play nice together.

Some of those rules can never work, others are inefficiently coded.

Never use (.*) in the middle or at the beginning of a RegEx pattern. Rule target should always include protocol and hostname.

Rules 1 to 4 and rules 5 to 7 essentially try to do the same thing, but are coded differently.

Rules 3 and 6 need to be merged - as do 4 and 7.

helenp

1:35 pm on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



uups,
there was 2 diferent ways above, not all together.
this is the good link, however to understand it..
[webmasterworld.com...]

lucy24

12:06 am on Jan 11, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



[^\ ]* makes space for any and every query string parameter of any length - and it gets stripped in the redirect.

Ah, got it. But you can still say \S* at a savings of three bytes ;)

TheMadScientist

1:37 pm on Jan 11, 2013 (gmt 0)

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Wow, it looks like the code's been Way over complicated for a simple http https redirect difference ... I would think all we would need are 2 rules, one with an extra condition to check the port:

# Redirect All Non-Port 443 Requests Containing index.htm
# Rule is first using a negative match for efficiency, assuming http is most used
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

# Redirect Requests Containing index.htm on Port 443 to https URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ [example.com...] [R=301,L]

I'm fairly certain only specific underlying operating systems allow for the use of shorthand with mod_rewrite (EG \s), which is why there's difficulty finding anyone who posts mod_rewrite expressions online using any of them, even if we do regularly use them in other coding ... There's nothing quite like changing hosts and having an .htaccess file stop working, because the patterns used for matching in mod_rewrite are 'not accepted' by the new server ... It's not some fun, believe me lol
This 93 message thread spans 4 pages: 93
 

Featured Threads

Hot Threads This Week

Hot Threads This Month