homepage Welcome to WebmasterWorld Guest from 54.161.202.234
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 93 message thread spans 4 pages: < < 93 ( 1 [2] 3 4 > >     
Increase in not found errors message in webmaster tools
helenp




msg:4534215
 10:44 am on Jan 8, 2013 (gmt 0)

Hi, my site has fallen (english section) and in the last 14 days I got this message twice:
http://www.example.com/: Increase in not found errors

My site is 10 years old, and often I delete pages, and I dont give any special error as this happens often,
so now I have thousands of not found errors, and many pages are years old, and Goggle has still not dropped them. What should I do with them?
Can this affect my ranking?
Thanks
.

[edited by: Robert_Charlton at 11:20 am (utc) on Jan 8, 2013]
[edit reason] examplified domain [/edit]

 

helenp




msg:4534606
 11:53 am on Jan 9, 2013 (gmt 0)

In GWT, on the "crawl error" page you can select each error and by type of error. When you check the links by clicking them, what additional information does it give you about where the links are "Linked From" (that's the name of the tab)?

It's likely the errors are links created by crawl mistakes but it would be interesting to know if they are all from your site or if there are any external links creating them.

This I explained in post just before this post of yours, they are all mine, however as I said, maybe there could have been an external site before

Str82u




msg:4534607
 11:55 am on Jan 9, 2013 (gmt 0)

@g1smd - that made me smile - you must run a tight ship when at the helm.

It's not to say that anyone needs to not use pages named "index" but not naming it in links has advantages for A/B testing and security.

@helenp - understood, sorry about not catching that in the page sooner.

helenp




msg:4534608
 12:07 pm on Jan 9, 2013 (gmt 0)

You should do: href="/svenska/"

The leading slash is required.

The link should not mention the index file filename.

Your DirectoryIndex directive should take care of delivering the correct content.


Ever heard of directoryindex, been checking a bit, so if I get you right I should tell apache to not serve /index.htm but just /?

Also what is best absolute or relative pages fixing the / problem.
I can easily change these links /sales/index.htm using find and replas, however Im afraid dreamweaver will keep doing that way and I will forget to change it.

g1smd




msg:4534615
 12:36 pm on Jan 9, 2013 (gmt 0)

DirectoryIndex index.htm ensures that when a user requests example.com/folder/ - the canonical URL for a folder or for the index page in a folder - the server uses the content of the index.htm file in that folder to fulfill that request, without telling the user what that file is actually called.

URLs are a reference system used "out there" on the web. Filenames are a reference system used "here" inside the server. The two are not at all the same thing, merely related by the actions of the server software and how it is configured.

The
DirectorySlash directive is also relevant. When user requests example.com/folder for folder name that exists, the server will send a redirect telling the browser to make a new request for example.com/folder/ instead.

The site should link to
href="/folder/" each time.

Make sure you install a local copy of Apache so you can fully test your site as http ://localhost/ or http ://127.0.0.1/ too.

helenp




msg:4534642
 1:34 pm on Jan 9, 2013 (gmt 0)


DirectoryIndex index.htm ensures that when a user requests example.com/folder/ - the canonical URL for a folder or for the index page in a folder - the server uses the content of the index.htm file in that folder to fulfill that request, without telling the user what that file is actually called.

The DirectorySlash directive is also relevant. When user requests example.com/folder for folder name that exists, the server will send a redirect telling the browser to make a new request for example.com/folder/ instead.

Thanks
Thats the server already do,
I have tested mysite/espanol
and mysite espanol/
and I get my site with/ at the end and without mentioning index.htm
However if I go to mysite homepage from an inner page, then I get index.htm though to that I have that in the link

DodgeThis




msg:4534654
 1:51 pm on Jan 9, 2013 (gmt 0)

Change your Home links to href="/".

g1smd




msg:4534655
 1:54 pm on Jan 9, 2013 (gmt 0)

Once all the internal links point to the correct URLs, the next step would be to redirect requests for index URLs to the URL ending with a slash.

If you're not able to do that for a while, as an intermediate step, adding the correct
rel="canonical" tag to each index page would be a good idea.

Tight ship is the only way to do things. :) Never let search engines guess what you meant.

Str82u




msg:4534738
 6:49 pm on Jan 9, 2013 (gmt 0)

@helenp
Thats the server already do,
I have tested mysite/espanol
and mysite espanol/
and I get my site with/ at the end and without mentioning index.htm
Correct - By default a server is usually set up to look for pages named "home", default" and "index" when a user clicks into a directory including your home directory (public_html).

As g1smd mentions, You'll want to make sure you use the trailing slashes "/" to avoid server redirects; the server will add it anytime you leave it off of a "/folder/directory" link to push the user inside the "/folder/directory/" - then it looks for the default index page to serve content.

Later on, if you feel fancy or paranoid, you can use DirectoryIndex in htaccess to tell the server that the index pages have a different name, that you name, like "/folder/name/example.html" could be the index page if you used
DirectoryIndex example.html
in the htaccess file in the folder named "/name/"
helenp




msg:4534748
 8:12 pm on Jan 9, 2013 (gmt 0)

Thanks all, I will start to work on this as soon as I finished to implement paypal, puf, nearly there :)
hopefully tomorrow.

So do I understand you right if I add this to htaccss
DirectoryIndex index.html
that will take you to the index page on root and on folders index page?
mysite.com/
mysite.com/swedish/

I will have to have a good look at this and I will come back and also see if some changes in webmastertool.

Str82u




msg:4534757
 8:32 pm on Jan 9, 2013 (gmt 0)

So do I understand you right if I add this to htaccss
DirectoryIndex index.html
that will take you to the index page on root and on folders index page?
mysite.com/
mysite.com/swedish/
That is correct but you do not need to do this because your server is already doing it by default... there's no reason for you to go the effort or add new files unless you plan on using something much different for the name of the index page to be served to users. This works per folder as well.
DirectoryIndex widgets.html
if that were in the htaccess for the folder/directory "mysite.com/swedish/" you are telling the server to show the page named "widgets.html" to users as the default index page even if there is a file in the same directory named "index.html".
helenp




msg:4534764
 8:50 pm on Jan 9, 2013 (gmt 0)

That is correct but you do not need to do this because your server is already doing it by default... there's no reason for you to go the effort or add new files unless you plan on using something much different for the name of the index page to be served to users. This works per folder as well.


Oh yes I know, but I meant as a redirection, if I write mysite.com/index.htm or clic on a link with that url that it would redirect me to mysite.com/
Ok then, thanks, sounded to easy lol

lucy24




msg:4534807
 10:48 pm on Jan 9, 2013 (gmt 0)

You'll want to make sure you use the trailing slashes "/" to avoid server redirects; the server will add it anytime you leave it off of a "/folder/directory" link to push the user inside the "/folder/directory/" - then it looks for the default index page to serve content.

To split hairs: The server* will issue this redirect (yes, it is a full-blown 301 redirect, whereas "index.whatever" is a rewrite) IF and ONLY IF mod_dir is at the default "DirectorySlash On" setting.


* Apache 2.x if you want to split even more hairs.

g1smd




msg:4534824
 11:36 pm on Jan 9, 2013 (gmt 0)

DirectoryIndex finds the right internal index file when URL with trailing slash is requested.

When URL with index filename included is requested the server will serve the index file as duplicate content unless you add a RewriteRule that tells the browser to make a new request for the right URL - the one ending with a slash.

lucy24




msg:4534870
 1:03 am on Jan 10, 2013 (gmt 0)

DirectoryIndex finds the right internal index file when URL with trailing slash is requested.

Where "finds the right internal index file" is short for:

... searches the list of names given in the DirectoryIndex directive, checks them against the contents of the requested directory, and uses the first one it finds.

;)

And if it doesn't find one, it proceeds to Option B: check whether auto-indexing (Indexes option) is enabled for the current directory, either explicitly or by inheritance.

Details like this can matter if you're on shared hosting, because the host may or may not divulge what the config-file settings are.

helenp




msg:4534934
 7:09 am on Jan 10, 2013 (gmt 0)

When URL with index filename included is requested the server will serve the index file as duplicate
content unless you add a RewriteRule that tells the browser to make a new request for the right URL -
the one ending with a slash.


This a kind member gave me by private mail and I think its what you are saying

# REDIRECT ROOT htm INDEX PAGE
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm\ HTTP/
RewriteRule ^index\.htm$ [mysite.com] [R=301,L]

Would this apply to all my index files?

helenp




msg:4534936
 7:18 am on Jan 10, 2013 (gmt 0)

I dont feel comfortable, the 404 error pages has increased with some hundreds.

Looks like all 404 errors starts with one of these patterns:
hacienda_nagueles_apartments_marbella.htm/sales/
hacienda_nagueles_apartments_marbella.htm/svenska/
hacienda_nagueles_apartments_marbella.htm/espanol/
hacienda_nagueles_apartments_marbella.htm/maps/

g1smd




msg:4534947
 8:19 am on Jan 10, 2013 (gmt 0)

As the code comment says, it applies only to the root index.

# REDIRECT ROOT htm INDEX PAGE

The one you need has been posted multiple times in the Apache forum so far this year.

Looks like all 404 errors starts with one of these patterns:
hacienda_nagueles_apartments_marbella.htm/sales/
hacienda_nagueles_apartments_marbella.htm/svenska/
hacienda_nagueles_apartments_marbella.htm/espanol/
hacienda_nagueles_apartments_marbella.htm/maps/


That usually happens when
hacienda_nagueles_apartments_marbella.htm links out to href="folder/" instead of href="/folder/".
helenp




msg:4534951
 9:08 am on Jan 10, 2013 (gmt 0)

That usually happens when hacienda_nagueles_apartments_marbella.htm links out to href="folder/" instead of href="/folder/".


Thanks,

[edited by: helenp at 10:07 am (utc) on Jan 10, 2013]

helenp




msg:4534959
 9:36 am on Jan 10, 2013 (gmt 0)

I found the apache code thanks and it works like a charm on http for all indexfiles,
suppose for https I would have to add a similar one for as it does not work on https

AddType application/x-httpd-php5 .htm .html
RewriteEngine On
# REDIRECT ROOT htm INDEX PAGE
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.mysite.com/$1? [R=301,L]

.

[edited by: Robert_Charlton at 10:03 am (utc) on Jan 10, 2013]
[edit reason] delinked linked example [/edit]

g1smd




msg:4534963
 10:13 am on Jan 10, 2013 (gmt 0)


Don't forget to alter the code comment to say "redirect index request in folders and root".


Don't forget to add the standard non-www/www redirect code after the index redirect.

That code is probably in the same thread as the index redirect code you quoted.

helenp




msg:4534964
 10:33 am on Jan 10, 2013 (gmt 0)


Don't forget to alter the code comment to say "redirect index request in folders and root".

I done thanks, to quick posting


Don't forget to add the standard non-www/www redirect code after the index redirect.

I already had that before thanks.

lucy24




msg:4534972
 11:07 am on Jan 10, 2013 (gmt 0)

[^\ ]* seems a complicated way of saying \S* -- especially when what you really mean is just l? (ell, question mark)

html?
=
htm|html

Besides, the Rule itself simply says htm$ with no "l" option. In this case the Rule and Condition need to match.

g1smd




msg:4534974
 11:19 am on Jan 10, 2013 (gmt 0)

[^\ ]* makes space for any and every query string parameter of any length - and it gets stripped in the redirect.

If extensions other than
.htm are supposed to redirect, then replace the '\.htm' bit with '\.html', '\.php', '\.html?', '\.(html?|php)', '\.(html?|php[345]?)' or whatever you need in both places.
helenp




msg:4534978
 11:48 am on Jan 10, 2013 (gmt 0)

No idea but it works, and I added an equal below for https and bot works perfect.
Thanks

g1smd




msg:4534983
 12:19 pm on Jan 10, 2013 (gmt 0)

If you have rules for both http and https in the same file that potentially cover the same path requests, you'll need to add extra conditions to both rulesets otherwise one will run for every request and the other will never run.

helenp




msg:4534989
 12:33 pm on Jan 10, 2013 (gmt 0)

If you have rules for both http and https in the same file that potentially cover the same path requests, you'll need to add extra conditions to both rulesets otherwise one will run for every request and the other will never run.

I thought it wouldnt be that easy lol,
however I tested it, went from http to https and contrary and as far as I seen it worked perfect.

I found this:
RewriteCond %{HTTPS} =on
RewriteRule ^(.+)$ - [env=ps:https]
RewriteCond %{HTTPS} !=on
RewriteRule ^(.+)$ - [env=ps:http]

# redirect urls with index.html to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)index.html HTTP/ [NC]
RewriteRule ^.*$ %{ENV:ps}://%{SERVER_NAME}/%1 [R=301,L]

# change // to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)//(.*) HTTP/ [NC]
RewriteRule ^.*$ %{ENV:ps}://%{SERVER_NAME}/%1/%2 [R=301,L]

And this from webmasterword - jdmorgan:
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.+)$ - [env=askapache:%2]

# redirect urls with index.html to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /([^/]+/)*index.html HTTP/
RewriteRule ^(([^/]+/)*)index.html$ http%{ENV:askapache}://%{HTTP_HOST}/$1 [R=301,L]

# change // to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /(.*)//(.*) HTTP/ [NC]
RewriteRule ^.*$ http%{ENV:askapache}://%{HTTP_HOST}/%1/%2 [R=301,L]

g1smd




msg:4534996
 1:23 pm on Jan 10, 2013 (gmt 0)

There are a lot of problems with that code.

You can't just cut and paste chunks of code and hope they all play nice together.

Some of those rules can never work, others are inefficiently coded.

Never use (.*) in the middle or at the beginning of a RegEx pattern. Rule target should always include protocol and hostname.

Rules 1 to 4 and rules 5 to 7 essentially try to do the same thing, but are coded differently.

Rules 3 and 6 need to be merged - as do 4 and 7.

helenp




msg:4535000
 1:35 pm on Jan 10, 2013 (gmt 0)

uups,
there was 2 diferent ways above, not all together.
this is the good link, however to understand it..
[webmasterworld.com...]

lucy24




msg:4535153
 12:06 am on Jan 11, 2013 (gmt 0)

[^\ ]* makes space for any and every query string parameter of any length - and it gets stripped in the redirect.

Ah, got it. But you can still say \S* at a savings of three bytes ;)

TheMadScientist




msg:4535275
 1:37 pm on Jan 11, 2013 (gmt 0)

Wow, it looks like the code's been Way over complicated for a simple http https redirect difference ... I would think all we would need are 2 rules, one with an extra condition to check the port:

# Redirect All Non-Port 443 Requests Containing index.htm
# Rule is first using a negative match for efficiency, assuming http is most used
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

# Redirect Requests Containing index.htm on Port 443 to https URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ https://www.example.com/$1? [R=301,L]

I'm fairly certain only specific underlying operating systems allow for the use of shorthand with mod_rewrite (EG \s), which is why there's difficulty finding anyone who posts mod_rewrite expressions online using any of them, even if we do regularly use them in other coding ... There's nothing quite like changing hosts and having an .htaccess file stop working, because the patterns used for matching in mod_rewrite are 'not accepted' by the new server ... It's not some fun, believe me lol

lucy24




msg:4535399
 2:54 am on Jan 12, 2013 (gmt 0)

The mod_rewrite docs point to pcre dot org, including the PCRE manual [pcre.org]. Heavy going; I don't recommend it for anyone who doesn't already speak RegEx. Under "Generic character types" it lists among other things*:

\d any decimal digit
\D any character that is not a decimal digit
\s any white space character
\S any character that is not a white space character
\w any "word" character
\W any "non-word" character


Nothing in Apache about the OS of the server.


* The "other things" include \h, \v and \n which have no meaning in mod_rewrite. In fact \h is a sad loss because in some non-PERL RegEx flavors it means a hexidecimal :( That would definitely come in handy sometimes as an alternative to [\da-fA-F]. The \w \d \s set-- along with negation by capitalizing-- also aren't recognized in POSIX, but that should not worry us here. Especially since they make up for it by having lots of other good stuff ;)

This 93 message thread spans 4 pages: < < 93 ( 1 [2] 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved