homepage Welcome to WebmasterWorld Guest from 23.23.9.5
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 93 message thread spans 4 pages: < < 93 ( 1 2 [3] 4 > >     
Increase in not found errors message in webmaster tools
helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 10:44 am on Jan 8, 2013 (gmt 0)

Hi, my site has fallen (english section) and in the last 14 days I got this message twice:
http://www.example.com/: Increase in not found errors

My site is 10 years old, and often I delete pages, and I dont give any special error as this happens often,
so now I have thousands of not found errors, and many pages are years old, and Goggle has still not dropped them. What should I do with them?
Can this affect my ranking?
Thanks
.

[edited by: Robert_Charlton at 11:20 am (utc) on Jan 8, 2013]
[edit reason] examplified domain [/edit]

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4534213 posted 2:54 am on Jan 12, 2013 (gmt 0)

The mod_rewrite docs point to pcre dot org, including the PCRE manual [pcre.org]. Heavy going; I don't recommend it for anyone who doesn't already speak RegEx. Under "Generic character types" it lists among other things*:

\d any decimal digit
\D any character that is not a decimal digit
\s any white space character
\S any character that is not a white space character
\w any "word" character
\W any "non-word" character


Nothing in Apache about the OS of the server.


* The "other things" include \h, \v and \n which have no meaning in mod_rewrite. In fact \h is a sad loss because in some non-PERL RegEx flavors it means a hexidecimal :( That would definitely come in handy sometimes as an alternative to [\da-fA-F]. The \w \d \s set-- along with negation by capitalizing-- also aren't recognized in POSIX, but that should not worry us here. Especially since they make up for it by having lots of other good stuff ;)

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 3:44 am on Jan 12, 2013 (gmt 0)

I know what's 'available' according to the docs, and I also know the short hand expressions have caused issues on servers in the past, which is again, why you'll likely be hard-pressed to find anyone, except you, recommending their use.

If you don't believe me, surf through some of the threads in the Apache Forum and I'm sure you'll run into issues they've caused or just use the flipping things and if they work for you on your box, great, but they've caused issues previously (especially with matching spaces), I know, because I used to hang out in the Apache Forum here and answer questions people had.

I posted to save people some headaches others have suffered that aren't necessarily detailed in the documentation, because I actually write the stuff and know what a headache it can be, especially when you read the docs and something 'should work' and doesn't for some reason, but people can do whatever they feel like, idgaf.

What I really don't understand though is if you know so much about mod_rewrite, why didn't you post an answer to the jumbled code that obviously didn't work instead of splitting hairs over 3 bytes of nothing?

It would have been way more helpful, seriously...

OR

You could have even just made my suggestion more efficient, with something somewhat like this:

RewriteRule !index\.htm$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 5:10 am on Jan 12, 2013 (gmt 0)

I'm done posting mod_rewrite out here in the Google Forum, but I need to make a correction no one else pointed out in my first code example, sorry...

I was coding quick helenp and forgot to remove the trailing slash from the HTTP to allow for an 'implicit match' of 'everything else' to match the S on HTTPS more efficiently than worrying about matching it since we checked the port, so below is corrected.

Sorry for the 'not exactly Google' posts, even though they're topical for this discussion.

# Redirect All Non-Port 443 Requests Containing index.htm
# Rule is first using a negative match for efficiency, assuming http is most used
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

# Redirect Requests Containing index.htm on Port 443 to https URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ https://www.example.com/$1? [R=301,L]

netmeg

WebmasterWorld Senior Member netmeg us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4534213 posted 2:07 pm on Jan 12, 2013 (gmt 0)

For some reason Google just dumped about 500 "not found" entries for URLs that haven't existed for three years on one of my sites. They're from the pre-WordPress version of the site; don't even theh same URL patterns anymore. The "linked from" info is either blank or from pages that also don't exist anymore. What's up with that?

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 2:34 pm on Jan 12, 2013 (gmt 0)

For some reason Google just dumped about 500 "not found" entries for URLs that haven't existed for three years on one of my sites. They're from the pre-WordPress version of the site; don't even theh same URL patterns anymore. The "linked from" info is either blank or from pages that also don't exist anymore. What's up with that?


Sounds like a cache version, funny, today the nonindexed excluded pages has increased with 1000 however the dates for indexing is the same as yesterday.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4534213 posted 3:35 pm on Jan 12, 2013 (gmt 0)

That will be the periodic recrawl of all the URLs they have ever known about on your site.

If the URLs no longer exist and they correctly return 404 or 410 status, there is nothing further for you to do. Google will check those URLs several times over a 72 hour period then not return for a very long time.

The dead URLs will drop off the list after 90 days, or you can "mark as fixed" now. Don't mark as "fixed" immediately they appear on the list as Google usually requests the URLs several times over a period of days. If you mark as "fixed" too soon, the URLs may well reappear in the list when they are crawled again.

When a URL no longer exists and all the pages that linked to it are also found not to exist, finally the URLs shouldn't re-appear again. This can take years.

netmeg

WebmasterWorld Senior Member netmeg us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4534213 posted 10:00 pm on Jan 12, 2013 (gmt 0)

Actually looks like some of em go back as far as the end of November; I just hadn't checked in a while.

Years is right.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 10:22 pm on Jan 12, 2013 (gmt 0)

# Redirect All Non-Port 443 Requests Containing index.htm
# Rule is first using a negative match for efficiency, assuming http is most used
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

# Redirect Requests Containing index.htm on Port 443 to https URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ https://www.example.com/$1? [R=301,L]


Thanks, sincerely, I dont understand anything about these comands, however I did the http redirection and it worked, then above I did a similar and for me it looked as it worked and I was told that was no good as one kicked out the other, and from your code I can see its just the same, one rule for http and the same for https, except for the port 443 thing, wich I dont understand whats about, suppose the port is 443 is the one used in https,
I wonder, does the hosts always use that port?
RewriteCond %{SERVER_PORT} !^443$

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 10:58 pm on Jan 12, 2013 (gmt 0)

Port 443 is the 'standard' secure port, where port 80 is the standard non-secure port.

* The only time I've personally seen it not on 443 is when someone runs their own box and changes it for some reason, but if it's not 443 for some reason, your host should have that listed in the docs or be able to tell you in about 3 seconds in a chat and changing the number in one condition is easy enough for me to not worry about changing the code for it.

What the rulesets say starts at the left side of the rule and then goes through the conditions from top to bottom then to the right side of the rule.

# Then Read Here
# If the rule (explained below) matches, this condition is checked
# It says 'If the server port is Not 443', check the next condition or
# continue to the 'instructions' on the right-side of the rule if there
# is no other condition below.
# - (!) at the beginning of a line indicates 'not'

RewriteCond %{SERVER_PORT} !^443$

# Finish With This
# If the rule (explained below) matches AND the condition above match,
# this condition (the 2nd one down) is checked.

# It says 'If the original request made by the browser starts with cap A to cap Z*
# one or more times' AND is then followed by a 'space', Followed by a /,
# Followed by any character that's not a / one or more times Followed by
# a /, Followed by index.htm and any character that's not a 'space' 0 or
# more times, Followed by a 'space', Followed by HTTP and anything else
# check the next condition or continue to the 'instructions' on
# the right-side of the rule if there is no other condition below.

# * Caps A to Z are used so POST, GET, HEAD, PROPFIND, etc. all 'match'
# but improperly formatted requests, such as 'get' (lowercase) don't.

# \ is used to match a literal space in the request, because a space without
# the \ is a 'separating character', so if the \ is not used, the condition
# would 'break' after the A to Z match and that would generate a server error.

# The reason this one is important and matches Only original requests made
# by a bot or browser is if we matched All request, the internal ones that
# make it so the information from index.htm is shown for directory requests
# EG http://www.example.com/some-dir/ would also be redirected and that would
# just 'loop in a circle' and cause a server error, because it wouldn't be able
# to 'grab' the index.htm file and serve the info when someone visited the
# 'directory version' of the page, so we make sure we're only sending 'original,
# external requests' to the alternate location ending in a / rather than all
# requests for index.htm

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP

# Stare Reading Here
# The rule (way below) is compared first and has two sides:

# The Left Side (before the space) is checked for a match.
# If it matches and there's one or more conditions, they are checked.
# If the left side of the rule and any/all conditions are met, the
# Right Side of the rule is applied.

# The Right Side of the rule contains the 'instructions' for what the server
# should do when the Left Side of the rule and any/all conditions are met.

# The Left Side of this Rule says:
# 'Match any character, except a / one or more times,
# followed by a / 0 or more times as a group, followed by index.htm at
# the end AND group/store anything up to index.htm as a back-reference
# AND also group/store anything up to /index.htm (including the /) as a
# back-reference.

# (Yes, it's just a 'confusing, long way' of saying 'match index.htm at the
# end of the requested location' and 'store all the other stuff up to it
# so we can use it later' if we want to.)

# The Right Side of this rule (applied when the left side matches
# and all conditions are met) says:
# Send the request to http://www.example.com/ followed by the first grouping
# of information we stored ($1), remove any query string (that's what the ? at the
# end of the URL does), send a 'permanent redirect message' to the browser or
# bot making the request [R=301], and stop processing the file [L]

# $ at the end of the Left Side or the end of a condition 'gives a definite end'
# if it's omitted, like in the conditions 'anything or nothing' after the last
# match is 'implicitly' counted as 'a match', so it's mostly good to use it
# unless you know why you don't want to ... In the conditions, I know enough
# has matched to 'verify what we need to' so there's no reason to match
# every single possible character to the end of the line, so 'implicitly matching'
# 'everything else' is fine to do and a bit more efficient than 'matching a bit more'
# just because we could.

RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

### # ###

The second ruleset says exactly the same thing with a couple of exceptions:

The condition of the port Not being 443 is removed, because the first rule already redirected any request ending in index.htm made that was not on Port 443 and then the file stopped processing, so we know if a request ending in index.htm made it this far it must be on Port 443.

Since we know it's on Port 443, we know it's an HTTPS request, so the right side of the rule is set to redirect exactly the same way as the first rule, except using https rather than http.

### # ###

These are the 'two together' being explained above.

# Redirect All Non-Port 443 Requests Containing index.htm
# Rule is first using a negative match for efficiency, assuming http is most used
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1? [R=301,L]

# Redirect Requests Containing index.htm on Port 443 to https URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ HTTP
RewriteRule ^(([^/]+/)*)index\.htm$ https://www.example.com/$1? [R=301,L]

### # ###

One thing to keep in mind about mod_rewrite is: There's usually a 100 different ways to 'say the same thing' and some are more efficient, less confusing, or just preference...

This is more efficient than above because I did a 'single check' without storing the same thing multiple times and if index.htm at the end of the location requested is not matched I skipped over the next two rules, but even though using skips can be highly efficient, if you use many your attention to detail has to be much higher than even usual and 'just running through the rules', because if you miss count or insert a rule in the middle of a 'skip' you can really mess things up, so I don't usually recommend it even though I code most files that way personally.

So, briefly, what I did in this one is checked to see if the URL requested has index.htm at the end, and if not I skip the next two rules.

If it does, then I used .? to say if 'one thing or nothing' matches, because it's very efficient and I want to get to the condition as fast as I can since I already know the URL ends in index.htm if the rules weren't skipped.

Then they're basically the same, except on the Right Side of the Rule I used %1 rather than $1 to get the info needed to know what page to send someone to from the condition rather than getting it from the rule since I didn't bother to match and store it there.

RewriteRule !index\.htm$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 8:51 pm on Jan 13, 2013 (gmt 0)

TheMadscientist, thanks for the lesson, I will study it and "try" to understand.

The not selected keeps increasing.....however the date is still 6/1, only 51.751 not selected pages at the moment.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 6:24 pm on Jan 14, 2013 (gmt 0)

NP helenp ... I know it can be really difficult to understand, so don't give up if you don't 'get it' right away ... There's some tutorials in the Apache Library [webmasterworld.com] you might find helpful too, because they explain quite a few of the 'basics' of mod_rewrite and regular expressions fairly well.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:21 pm on Jan 14, 2013 (gmt 0)

THow bad, this is not the place for this but to continue on the same thread.
This
RewriteRule !index\.htm$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

Makes the file https://www.mysite/file1/file2/index.htm
redirect to https://www.mysite/file1/index.htm
that of course gives an 404 error.
Also I have a redirection from non www to www
like this:
ewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule (.*) [mysite.com...] [R=301,L]
as that works I tried to add https to it the same way using the RewriteCond %{SERVER_PORT} !^443$

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 11:25 pm on Jan 14, 2013 (gmt 0)

Sorry, I forgot to add the 2nd set of () to 'group/store' the matches made by the ([^/]+/)* pattern ... Oops! Sometimes I still have to kick myself with this stuff lol.

Change it to:


RewriteRule !index\.htm$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.htm[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.htm[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

For the www non-www you need 2 again:
Let's make these a bit more efficient while we're at it...


RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule .? https://www.example.com%{REQUEST_URI} [R=301,L]

Let me know if it gives you any more issues.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 9:05 am on Jan 15, 2013 (gmt 0)

Let me know if it gives you any more issues.


I think it works now, I got the same in firefox twice, not all the time I think, tried to clean cache and did not work to clean cache anymor......so I tried in chrome and internet explorer sometimes and looks ok.
Managed to clean cache in firefox while closing I think and tried again several times and looks again.
However its strange, not sure, I am on Speedlite and ever got cached content served when files changed.
Btw, anybody knows how to fix firefox 17.0.1?

Edited, there is definiteviley something strange, not sure if in firefox, server or the code,
just wrote [mysite.com...]
and was redirected to https://www.mysite.com/file/file/index.htm

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 9:29 am on Jan 15, 2013 (gmt 0)

Puf I have tried to edit G1smds code to suit me, but gives me the same result in Crome,
index.htm file in third level being redirected from http to https:
# Redirect index.html and .htm to folder
RewriteRule !index\.html? - [S=2]
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ https://www.example.com/$1 [R=301,L]


# Redirect non-canonical to www
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]

Sort of feel afraid of doing tests having my homepage fallen in Google

[edited by: helenp at 9:49 am (utc) on Jan 15, 2013]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4534213 posted 9:34 am on Jan 15, 2013 (gmt 0)

Conditions apply only to the next rule that follows.

You need the port test in some form or other ("is 443" or "is not 443") before every rule.

Add a blank line after each rule to make the code more readable.

Use example.com in this forum to suppress URL auto-linking.
The post 'edit' button is just below your user name.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 9:53 am on Jan 15, 2013 (gmt 0)


Conditions apply only to the next rule that follows.

You need the port test in some form or other ("is 443" or "is not 443") before every rule.

Add a blank line after each rule to make the code more readable.

Use example.com in this forum to suppress URL auto-linking.


Thanks, will give it a try with !^443$ and ^443$

I know about the linking however one keep forgetting with the rush, thanks:)

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 10:54 am on Jan 15, 2013 (gmt 0)

g1smd I tried to sticky you, but your box is full and you've gotta be tired, so I'm gonna say this quietly, because you're always right with mod_rewrite, but you don't need to check the port for both rules when you use a negative match in the first, because when you redirect all occurrences of index.html on all ports except 443 with the first you know you have to be on 443 in the second when index.html matches in the rule, so you don't need the second condition.

I think there's something else going on though, because I didn't notice until after I posted helenp said the redirect ended at /index.htm, but didn't get the middle folder for the 1st ruleset tried, but there's no way to match the 1st directory, miss the 2nd, and match index.htm with the pattern I used and I only used %1 in the new location from the condition, so it should have actually ended up at example.com/folder1/ not example.com/folder1/index.htm.

I'm wondering if there's another redirect somewhere causing a conflict, but it's probably easier for helenp to get it working if only one of us is explaining it at a time, so I'll leave it to the two of you.

helenp if you're reading this, just listen to g1smd ... He'll help you get it figured out I'm sure ;)

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:04 am on Jan 15, 2013 (gmt 0)

g1smd I tried to sticky you, but your box is full and you've gotta be tired, so I'm gonna say this quietly, because you're always right with mod_rewrite, but you don't need to check the port for both rules when you use a negative match in the first, because when you redirect all occurrences of index.html on all ports except 443 with the first you know you have to be on 443 in the second when index.html matches in the rule, so you don't need the second condition.


I have tried to add the port on the second, and the same result

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:07 am on Jan 15, 2013 (gmt 0)

I think there's something else going on though, because I didn't notice until after I posted helenp said the redirect ended at /index.htm, but didn't get the middle folder for the 1st ruleset tried, but there's no way to match the 1st directory, miss the 2nd, and match index.htm with the pattern I used and I only used %1 in the new location from the condition, so it should have actually ended up at example.com/folder1/ not example.com/folder1/index.htm.

Think I wrote that bad if I dont remember bad,
when I wrote [exampel.com...]
I was redirected to https://exampel.com/file/

[edited by: helenp at 11:12 am (utc) on Jan 15, 2013]

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 11:10 am on Jan 15, 2013 (gmt 0)

That's what it was supposed to do? I'm confused ... I think it was probably working when you said it was, because there's no reason I can see why it shouldn't have if you used the code from the last post I made with the corrections in it.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:11 am on Jan 15, 2013 (gmt 0)

I'm wondering if there's another redirect somewhere causing a conflict, but it's probably easier for helenp to get it working if only one of us is explaining it at a time, so I'll leave it to the two of you.

There are no other redirects,
only like this in comments as I dont need on this server:
#RewriteCond %{HTTP:Accept-Encoding} gzip
#RewriteCond %{REQUEST_FILENAME}.jgz -f
#RewriteCond %{HTTP_USER_AGENT} !MSIE\s[56]\.\d+;\sWindows
#RewriteRule (.*)(\.js|\.css)$ $1$2.jgz [L]
then there are only
<Files
<FilesMatch adn
Headers
Thanks

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:15 am on Jan 15, 2013 (gmt 0)

That's what it was supposed to do? I'm confused ... I think it was probably working when you said it was, because there's no reason I can see why it shouldn't have if you used the code from the last post I made with the corrections in it.


Sorry, I edited it, in next post I doubblechecked

[edited by: helenp at 11:26 am (utc) on Jan 15, 2013]

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 11:25 am on Jan 15, 2013 (gmt 0)

Just doubblechecked using this:
RewriteRule !index\.htm$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.htm[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.htm[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

For the www non-www you need 2 again:
Let's make these a bit more efficient while we're at it...

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule .? https://www.example.com%{REQUEST_URI} [R=301,L]

And yes
http://example.com/file1/file2/index.htm
is redirected to https://example.com/file1/file2/
Wich is not correct, no need to put the https machine on

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4534213 posted 11:27 am on Jan 15, 2013 (gmt 0)

While the second port test might not be strictly required, it is best to add it both for code clarity and should you ever swap the rule order for any reason at some later date, the code should still work.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 11:29 am on Jan 15, 2013 (gmt 0)

Alright, I'll try this one more time then I'm going to leave it to you and g1smd, because I need some sleep and he's great with mod_rewrite, so I'm sure he'll be able to help you with it if this doesn't work...

Copy and paste the following, then save the file, then before you test it, make sure you empty your browser cache otherwise you'll likely end up with your browser 'remembering' where it's supposed to go from one of the previous tests.

RewriteRule !index\.html?$ - [S=2]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.html?[^\ ]*\ H
RewriteRule .? http://www.example.com/%1? [R=301,L]

RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.html?[^\ ]*\ H
RewriteRule .? https://www.example.com/%1? [R=301,L]

RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} ^mysite\.com$
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{HTTP_HOST} ^mysite\.com$
RewriteRule .? https://www.example.com%{REQUEST_URI} [R=301,L]

* Good point g1smd and probably definitely better to have it in this situation ... I'm going to leave it to the two of you.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4534213 posted 11:36 am on Jan 15, 2013 (gmt 0)

Oh, and g1 I usually use a negative match for the canonicalization option and check for empty too, but I didn't want to confuse things more than they already have been and I'm not sure on other subs, so I figured I'd leave well enough along and the other can be added / updated later if helenp decides to sometime in the future.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 6:00 pm on Jan 15, 2013 (gmt 0)

I have not tried the last one yet, but I have tried your previos and also G1Smd's and strange, now both works, thought was due to that I fixed firefox and can now clean cache, however some of the many times I tried I suddenly got an https again, same as before, and then suddenly it did work. And the bug happened when I went from https to http
and wrote index.htm in the http-bar.
As I said I am on LiteSpeed and looks like it is a server cache thing

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 6:06 pm on Jan 15, 2013 (gmt 0)

Now both do it all the time, even if I clean cache, and its when I go from https://example.com/file1/file2/ then click on link http://example.com/file1/file2/ I wrote index.htm in the bar and I get the same file but with https
So strange

edited, it works, its only cache, thanks so much to both

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4534213 posted 10:49 pm on Jan 15, 2013 (gmt 0)

RewriteRule {anything here} [S=2]


Use the [S=some-number] flag WITH EXTREME CAUTION.* Include a comment line before the rule, reminding yourself that it's there. And a second comment line, reminding yourself that the comment line is there. (I'm not kidding.) You will need to re-check it every time you've edited your htaccess, to make sure the number is still correct.

RewriteRules normally execute from top to bottom, one after the other, in order, until they meet an [L] or equivalent. The [S] is one of a handful of flags --the others that come to mind are [C] and [N]-- that can potentially interfere with this ordering. Make double sure it's doing what you want and only what you want.


* Unless your name is jdmorgan.

helenp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4534213 posted 1:58 pm on Jan 19, 2013 (gmt 0)

Thanks all, I am so appreciated, this morning I checked google webmaster tools and saw the exluded pages instead of dropped number of pages increased them....with about 8000 pages.
Then I went to google.com searched with hl=en and wow, my homepage is back, not on top but at the end on FIRST page....
Thanks again everybody.

This 93 message thread spans 4 pages: < < 93 ( 1 2 [3] 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved