Forum Moderators: phranque

Message Too Old, No Replies

This wp htaccess file may have killed my business?

         

mboydnv

11:54 pm on Sep 13, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



In June of 2013, I installed a new htaccess file that would lock things down a bit and still allow google through. 2 months later traffic dropped significantly and I have never recovered. Well two years later my income is down 80%. Sure it could be other things, backlinks etc, thin content, privacy etc.

I'm working hard now trying to fight back. I was just wondering if I may post my htaccess file and one of you talented gurus could spot something harmful (some sort of deny command). I would be most grateful for any insight. Thanks so much.

For wordpress 4.3, php 5.4+

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

# PROTECT WPCONFIG
<files wp-config.php>
order allow,deny
deny from all
</files>

# Enable Leverage Browser Caching
<IfModule mod_rewrite.c>
ExpiresActive On

# Favicon (cannot be renamed)
ExpiresByType image/x-icon "access plus 1 week"

# Media: images, video, audio
ExpiresByType audio/ogg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType video/mp4 "access plus 1 month"
ExpiresByType video/ogg "access plus 1 month"
ExpiresByType video/webm "access plus 1 month"

# CSS and JavaScript
ExpiresByType application/x-javascript "access plus 1 week"
ExpiresByType text/css "access plus 1 week"
ExpiresByType text/javascript "access plus 1 week"

# Webfonts
ExpiresByType application/vnd.ms-fontobject "access plus 1 month"
ExpiresByType application/x-font-ttf "access plus 1 month"
ExpiresByType application/x-font-woff "access plus 1 month"
ExpiresByType font/opentype "access plus 1 month"
ExpiresByType image/svg+xml "access plus 1 month"

</IfModule>
# End Leverage Browser Caching

# TYPES FIX
AddType text/css .css
AddType text/javascript .js

# Enable GZIP Compression
SetOutputFilter DEFLATE
AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml text/javascript application/x-javascript application/x-httpd-php
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip
Header append Vary User-Agent env=!dont-vary
# End GZIP Compression

# DISABLE DIRECTORY BROWSING
Options All -Indexes

# PROTECT HTACCESS
<files ~ "^.*\.([Hh][Tt][Aa])">
order allow,deny
deny from all
satisfy all
</files>

# DISABLE SITEMAP INDEXING BY GOOGLE AND OTHERS
<IfModule mod_rewrite.c>
<Files ~ "^(post-sitemap.xml|category-sitemap.xml|sitemap_index.xml)\.xml$">
Header set X-Robots-Tag "noindex"
</Files>
</IfModule>

# FORBID COMMENT SPAMMERS ACCESS TO YOUR wp-comments-post.php FILE
# This is a better approach to blocking Comment Spammers so that you do not
# accidentally block good traffic to your website. You can add additional
# Comment Spammer IP addresses on a case by case basis below.
# Searchable Database of known Comment Spammers http://www.stopforumspam.com/

# BLACKLISTED USER AGENTS
SetEnvIfNoCase User-Agent "Acunetix" keep_out
SetEnvIfNoCase User-Agent "FHscan" keep_out
SetEnvIfNoCase User-Agent "Baiduspider" keep_out
SetEnvIfNoCase User-Agent "Yandex" keep_out
<Limit GET POST PUT>
order allow,deny
allow from all
deny from env=keep_out
</Limit>
# END BLACKLISTED USER AGENTS

<FilesMatch "^(wp-comments-post\.php)">
Order Allow,Deny
Deny from 46.119.35.
Deny from 46.119.45.
Deny from 91.236.74.
Deny from 93.182.147.
Deny from 93.182.187.
Deny from 94.27.72.
Deny from 94.27.75.
Deny from 94.27.76.
Deny from 193.105.210.
Deny from 195.43.128.
Deny from 198.144.105.
Deny from 199.15.234.
deny from 61.129.102.208
deny from 193.109.91.134
deny from 217.219.192.69
deny from 211.60.171.3
deny from 222.183.140.102
deny from 217.173.0.
deny from 217.173.0.200
deny from 195.225.176.87
deny from 70.86.125.242
deny from 209.68.4.105
deny from 72.21.59.66
deny from 82.104.138.50
deny from 70.230.167.254
deny from 208.111.154.
deny from 74.202.66.134
deny from voxel.net
deny from 66.117.6.90
deny from 59.60.126.12
deny from 142.54.184.181
Allow from all
</FilesMatch>

not2easy

2:13 am on Sep 14, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The only part of this htaccess file that is related to wp is the first few lines, the rest are things added trying to do one thing or another. The wp section belongs at the end of the htaccess file, after the www/non-www canonical rewrite business. As for the rest of it, it isn't killing your traffic, but it is not helping much either. Where to start? Blocking one IP at a time from one file at a time is inefficient use of your server's resources. BTW, denying by IP works pretty well if done right, but you can't use "deny" as shown here for a domain name. Mixing cases doesn't help in the various allow, deny settings and <FilesMatch> doesn't belong in the deny list.

Rather than blocking comment spammers one at a time, use a plugin there are hundreds of plugins that prevent comment spam. This is by no means a complete list, but I am short on time right now. I suggest that it is not a good practice to use copy/paste fixes to deal with some of the tasks you have here.

lucy24

2:36 am on Sep 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You left out one essential piece of information: How has crawling changed? What looks different in your logs?

Incidentally, there's no need for any <IfModule> envelope. Either you've got it or you haven't. Leave any envelopes that are inside the ##Wordpress section, but otherwise get rid of them. That is: ditch the envelope, keep its contents.

Some of those "Deny from..." lines are much too narrow. A single offending IP probably means an infected browser; by now they've probably got an entirely different address. Block whole ranges-- and for heaven's sake get them all into numerical order so you can find them again. (I've found by experience that I manage better if ARIN and RIPE ranges are listed separately, but that's me.)

deny from voxel.net

ohmygod don't ever do this! Even a single "Deny from" followed by something other than a numerical address or CIDR range is enough to throw your entire server into lookups mode, which slows things down and makes logs all but unreadable. (Last time it happen to me, it was caused by a comma that had crept into my CIDR lists. I meant that literally. ONE COMMA.)

tangor

3:17 am on Sep 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some of this looks like a start over from scratch and be more accurate in what you wish to accomplish.

That said, there's no deal breakers in there for SEs, so something else changed in that two month period. Might want to look at your logs, or your action logs (you do keep note of what changes you make to your site either in a file or an ordinary notebook, correct?).

This .htaccess, by itself, does not explain that kind of traffic drop. Something ELSE happened in August 2013 and that's what you need to look at.

mboydnv

6:31 pm on Sep 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you for your response not2easy, I deleted

<FilesMatch "^(wp-comments-post\.php)"> ... 


So you recommend moving the wordpress code:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress


to the very bottom? Thank you so much.

mboydnv

6:34 pm on Sep 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy24, I will comb through my logs from back then and see what I can conclude from them, great idea! I've removed the IP filesmatch code as stated above..

I don't understand "Incidentally, there's no need for any <IfModule> envelope. Either you've got it or you haven't. Leave any envelopes that are inside the ##Wordpress section, but otherwise get rid of them. That is: ditch the envelope, keep its contents. "

What would be the resulting code? I'm not that great at this as you can tell..

Thank you so very much!

not2easy

7:52 pm on Sep 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think she meant the multiple use of this kind of "envelope":
<IfModule mod_rewrite.c>
code
code
code
</IfModule>

She's right, it is part of the WP "snippet" that is added by WP when it is installed and sometimes again when it updates. WP looks for the envelope so you should leave it there. The other <IfModule envelopes ("envelope" is like a prefix/suffix combination to enclose rules) are not needed. Envelopes are used sometimes when you make a rule conditional on the Apache Module being installed on the server and active. This is left over from older versions of Apache. Unless you know that you domain is on an server running an antique version of Apache, they are not needed. The "envelope" used in WP is needed, I found out the hard way.

The deny rules you have there (other than that one domain name) can be used inside WP, in the Settings panel where you can add IPs to block one at a time. Put them in numerical order and if you see that a dozen are coming from IPs that only vary at the end, do a whois lookup and block the server they come from. It is not nearly as efficient as using Akismet and if spam is a really bad problem that you can't keep up with, use a captcha plugin.

If you have a long list of IPs, it is better to check whois and get the CIDR because blocking:
deny from 94.153.0.0/18

is a lot more efficient than blocking from
deny from 94.153.12.44
deny from 94.153.0.78
deny from 94.153.55.210
deny from 94.153.48.111
deny from 94.153.91.34
deny from 94.153.118.78
deny from 94.153.65.184
deny from 94.153.88.173

mboydnv

8:15 pm on Sep 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



using 2.2.31 apache
5.4.45 php
mySQL 5.5.42-cll

---

Should i remove the indexing of the sitemap? Doesn't google spider it anyways via the webmaster tools section. I have no errors. I just didn't want the .xml to be indexed in Google. Of course i want it followed. So you think that would help, by removing it?

I'm also combing through log files from back then to figure out what else happened. Google had a major update oct 4 2013 and hummingbird was also happening.... I didn't lose over night, but over two months...

This sucks.... My site is over 11 years old, and was doing great for the last few years,.. there's something else at play.... hmmmm

lucy24

8:34 pm on Sep 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I once did some experimenting with <IfModule>. (This is why people have test sites. If an experiment results in 500 errors across the board, nobody is hurt.) The named module does not need to have anything to do with what's inside the envelope; all that matters is whether you have access to the module or not. ("Have" and "have access to" can be different things, notably on shared hosting, where a module might exist in the server, but its directives can't be used in htaccess.)

Take the pattern
<IfModule suchandsuch>
blahblah
otherblahblah
stillmoreblahblah
</IfModule>

If you do have the module, the server will treat this package as if it said
# <IfModule suchandsuch>
blahblah
otherblahblah
stillmoreblahblah
# </IfModule>

If you do not have the module, the package is treated as if it said
# <IfModule suchandsuch>
# blahblah
# otherblahblah
# stillmoreblahblah
# </IfModule>

Unlike <FilesMatch> or <Directory> or similar envelopes, <IfModule> has no effect on execution order.


Should i remove the indexing of the sitemap?

Google will index everything, unless you've explicitly told them not to. They'd index your htaccess if they could. (They can't.) Now, unless you have some exceptionally clever prose in your sitemap or robots.txt, it is hardly likely to show up in searches. But may as well slap on a noindex header.

:: shuffling papers ::

<FilesMatch "\.(js|txt|xml)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>

That's my version. I don't happen to have any .txt files except robots.txt, and I don't want my scripts indexed. (Individual humans are free to snoop, and search engines can crawl, but that's different from outright indexing of content.)

Edit: That's assuming you meant "index the sitemap itself". That's a completely different question from indexing the individual pages named in the sitemap.

not2easy

10:50 pm on Sep 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I know of one site that sort of went into oblivion for no discernible reason until I went through each part of the site and found that because each directory had its own proprietary .htaccess file, you could access any page that was not in the root directory with and without the www. I have read many times that canonical www/non-www does not matter. But the site recovered within a week of adding the www rewrite to each directory. That was the only change. That does not mean that is the reason it recovered, but because of that I don't ignore that factor. I don't see a canonical rewrite in your htaccess., that can't hurt to add.

Another thing (not Apache related) is that if this is a WP site as it appears to be, you should deal with the way WP generates multiple ways to access the same content. If you know your way around the php functions files you can handle it there, but for most of us, it is easier to use a good plugin like Yoast SEO to give all versions a canonical tag to the preferred version and index only one version and submit your sitemap without duplicates. There are many minor things that can tank a site, even with only the best of intentions.

tangor

7:18 am on Sep 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This sucks.... My site is over 11 years old, and was doing great for the last few years,.. there's something else at play.... hmmmm

Nobody ever wants to hear this... but unless your site, and the contents (or your products) are evergreen, there is a possibility that "your" time has come and gone and the users have "moved on".

Not saying that is the case, but it is something to consider. Many a site has gone that route over the years, and that cycle will continue. It is merely a fact of life.

Do research on "Hummingbird" and see if any of that G algo change had anything to do with your site. IIRC that rolled out sometime in August of 2013....

(see: [en.wikipedia.org...] )

mboydnv

7:30 pm on Dec 17, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you everyone for your help. I wanted to come back and update everyone who tried to help me last September. The time you gave to my post and your ideas and suggestions meant a lot. I'm always touched by the generosity of strangers. It humbles me.

However I must reach out yet again. Things have become very difficult for me in trying to earn a living with my 10 years + in business. My site traffic has continued to sink. This Christmas is turning out to be a nightmare. I'm trying to stay focused and optimistic that I can recover and continue to support and feed my family.

I've been quite active trying to turning over every rock. At the end of this post is my current htaccess file for your approval.

In the last 3 months I have:

- moved the site from [www...] to [mydomain.coom...] - I hired a linux professional to assist in all changes. I made additional changes in GWT. And moved on to other things.

- Hired a Google SEO guy who advised and created a disavow file and uploaded that.
- Deleted 12 posts or more of thin content (200-300 words.) Redirected those URLS to beefed up content.
- Added content 7- 8 posts of 1,000 words with video and 4 pics with captions, alt tags etc.
- I continue to blog 1,000 word articles every week. Good content.
- I tickle social media- facebook, twitter, google+, tumblr, stumbleupon.
- Went back to old posts on Google+, Pinterest, Tumblr and updated links from [www...] to https://
- Removed sidebar from website. It caused hundreds of repetitive internal links. Most of my customers seem to be on mobile and are one time customers.
- Added 2 related posts at the bottom of each post that relates to the article. Improving stickiness, bounce etc.
- updated Google business listing with current mailing address. Got verified.
- Updated Post Affiliate Pro Tracking software to current version. Now only using direct linking from affiliates. They can't link with aff id's anymore. So I get clean back links from them. Also added this tag to header:
 <?php
if (isset($_GET['a_aid'])) echo '<meta name="robots" content="noindex">'; // edit by Martin.

wp_head(); /** we hook up in wp_booster @see td_wp_booster_functions::hook_wp_head */
?>

- Deleted duplicate content on blogger.com that had tons of anchor links. Set up years ago. All cleaned up.

- loosened up robots.txt file to allow following of js files etc.
- just recently unchecked targeted box in GWT for traffic to the world not just USA on www site (which redirects to https as 301)
- I've updated our wordpress template to current version (Newspaper by Tagdiv).
- I believe Yoast is configured correctly (i'm holding off on current version). Re: Canonical Urls, there is a tag in the header generated by yoast. Even though my site has NO duplicate content. It is on all pages.

Any suggestions are greatly appreciated. I wish you all good health and happiness this holiday season.

Below is my current htaccess. Is there anything in there that could be causing this grief?

Note: I am also using these Wordpress plugins: Is there a redirect loop being caused that Google doesn't like?
- Redirection: Version 2.4.3 | By John Godley
- Ultimate Noindex Nofollow Tool II Version 1.3 | By Kilian Evang
- WP Category Permalink Version 2.2.8 | By Jordy Meow
- Yoast SEO Version 2.3.5 | By Team Yoast

Thank you so very much,


RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* ? [F,L]

Header set Strict-Transport-Security "max-age=31536000" env=HTTPS

RewriteCond %{HTTP_HOST} ^XX\.28\.106\.XX [OR]
RewriteCond %{HTTP_HOST} ^www\.domain\.com
RewriteRule (.*) https://domain.com/$1 [R=301,L]

RewriteCond %{SERVER_PORT} 80
RewriteRule (.*) https://domain.com/$1 [R=301,L]

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

# PROTECT WPCONFIG
<files wp-config.php>
order allow,deny
deny from all
</files>

# Enable Leverage Browser Caching
<IfModule mod_rewrite.c>
ExpiresActive On

# Favicon (cannot be renamed)
ExpiresByType image/x-icon "access plus 1 week"

# Media: images, video, audio
ExpiresByType audio/ogg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType video/mp4 "access plus 1 month"
ExpiresByType video/ogg "access plus 1 month"
ExpiresByType video/webm "access plus 1 month"

# CSS and JavaScript
ExpiresByType application/x-javascript "access plus 1 week"
ExpiresByType text/css "access plus 1 week"
ExpiresByType text/javascript "access plus 1 week"

# Webfonts
ExpiresByType application/vnd.ms-fontobject "access plus 1 month"
ExpiresByType application/x-font-ttf "access plus 1 month"
ExpiresByType application/x-font-woff "access plus 1 month"
ExpiresByType font/opentype "access plus 1 month"
ExpiresByType image/svg+xml "access plus 1 month"

</IfModule>
# End Leverage Browser Caching

# TYPES FIX
AddType text/css .css
AddType text/javascript .js

# Enable GZIP Compression
SetOutputFilter DEFLATE
AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml text/javascript application/x-javascript application/x-httpd-php
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip
Header append Vary User-Agent env=!dont-vary
# End GZIP Compression

# DISABLE DIRECTORY BROWSING
Options All -Indexes

# PROTECT HTACCESS
<files ~ "^.*\.([Hh][Tt][Aa])">
order allow,deny
deny from all
satisfy all
</files>

# FORBID COMMENT SPAMMERS ACCESS TO YOUR wp-comments-post.php FILE
# This is a better approach to blocking Comment Spammers so that you do not
# accidentally block good traffic to your website. You can add additional
# Comment Spammer IP addresses on a case by case basis below.
# Searchable Database of known Comment Spammers http://www.stopforumspam.com/

# BLACKLISTED USER AGENTS
SetEnvIfNoCase User-Agent "Acunetix" keep_out
SetEnvIfNoCase User-Agent "FHscan" keep_out
SetEnvIfNoCase User-Agent "Baiduspider" keep_out
SetEnvIfNoCase User-Agent "Yandex" keep_out
<Limit GET POST PUT>
order allow,deny
allow from all
deny from env=keep_out
</Limit>
# END BLACKLISTED USER AGENTS

lucy24

9:58 pm on Dec 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is that literally the entire htaccess? Your first two RewriteRules-- the ones outside the WP envelope-- will never execute, because you forgot to say RewriteEngine on. (Saying it more than once, with no intervening "off", looks silly but is not harmful and may be necessary with a CMS.)

More later, probably.

mboydnv

12:06 am on Dec 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy24, that is the entire htaccess. Could you please be so kind as to post the correct code for me? I'm not that savvy with this stuff and at times need to be hit over the head. :P I wish I knew what was bugging Google. =( Thank you so much!

raseone

12:40 am on Dec 18, 2015 (gmt 0)



^^ Yup the first Couple wont work because "RewriteEngine On" needs to happen before any rewrites. I think you only need it once but as lucy24 said it should not hurt to add it at the begining of each rewrite condition. For example:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* ? [F,L]

Header set Strict-Transport-Security "max-age=31536000" env=HTTPS

RewriteEngine On
RewriteCond %{HTTP_HOST} ^XX\.28\.106\.XX [OR]
RewriteCond %{HTTP_HOST} ^www\.domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]

RewriteCond %{SERVER_PORT} 80
RewriteRule (.*) [domain.com...] [R=301,L]

If the www/non-www issues was ever part of the problem this might help. Lots of evidence for both sides on weather or not thats a ranking issues. Its certainly something search engines would have been dealing with since the dawn of the internet so it seems unlikely to be a reason for a drop in rankings.

You should keep these things labeled and keep a "RewriteEngine On" with each one incase you accidentally delete it while making edits in the future.

Consulting a linux guru on this was definitely a good idea.

The https thing might actually have more impact. Not entirely sure if you would want 301s for urls that used to be http & are now https.

My more jaded advice is that fixing any genuine issues in this area is unlikely to provide a recovery in your google rank.

mboydnv

1:05 am on Dec 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



raseone: Thank you so very much for your code and advice.

Re: Not entirely sure if you would want 301s for urls that used to be http & are now https. - I believe that was needed at the time, to be recrawled. I'm not re-indexed in Google as [....] Maybe I can remove the redirect? But what happens with links out there linking to my site as [www....] don't want those links to return 404.

RE: "My more jaded advice is that fixing any genuine issues in this area is unlikely to provide a recovery in your google rank."

=(

Indexing HTTPS pages by default
Thursday, December 17, 2015

- [googlewebmastercentral.blogspot.com...]

raseone

2:03 am on Dec 18, 2015 (gmt 0)



The rewrite in the htacces file for www to non-www is there specifically to stop that from happening. Any incomming traffic going to a www url will gobto the non-www instead. The links will not be broken.

Thank the other guy fir the rewriteengine clue though.