Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite rules don't validate the leading folders

htaccess,rewrite,url

         

hottrout

4:54 pm on Jun 27, 2011 (gmt 0)

10+ Year Member



For the past number of months Google Webmaster Tools has been showing increasing numbers of Not Found 404 errors in the crawl errors. It is currently at over 2200 errors. It was at 1200 last week.

The 404's are for URL's on my site that have never existed and the page that they were supposedly linked from is no longer available.

The strange bit is that the URL is made up from valid parts of my site but in a combined incorrect order. Let me explain,

One of the invalid URL's looks like this;

mydomain.com/libraries/radio/libraries/Pictures/gamecovers/images.htm

One part of the URL is correct
mydomain.com/libraries/radio/stations.htm

and the other part is also correct
mydomain.com/libraries/Pictures/gamecovers/images.htm

Google seems to be detecting parts of each url and combining them.

I have no idea how this is being created and I thought that it was just a google glitch. I have requested help on the google webmaster forum several times now to no avail. I would appreciate anyones help with this and have also included the main body of my htaccess file (and apologise in advance)


--------------------


RewriteEngine On
DirectoryIndex index.php
IndexIgnore *

#preserve bandwidth for PHP enabled servers
<ifmodule mod_php4.c>
php_value zlib.output_compression 16386
</ifmodule>

#Send example.com to www.example.com
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

#Send index.php to root url
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

<Files /error/403.htm>
order allow,deny
allow from all
</Files>
<Limit GET POST>
order allow,deny
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

ErrorDocument 400 /error/400.htm
ErrorDocument 401 /error/401.htm
ErrorDocument 403 /error/403.htm
ErrorDocument 404 /error/404.htm
ErrorDocument 500 /error/500.htm

############### Set up Caching
Header unset Etag
FileETag None
# Set up caching on favicon for a loong time
<FilesMatch "\.ico$">
Header set Expires "Mon, 20 Apr 2015 23:30:00 GMT"
</FilesMatch>
# Set up caching on files for 4 weeks
<FilesMatch "\.(pdf|mov|wmv|gif|jpg|jpeg|png|swf|js|css)$">
ExpiresDefault A2419200
Header append Cache-Control "public"
</FilesMatch>
# Set up 4 Hour caching on commonly updated files
<FilesMatch "\.(txt|html|htm)$">
ExpiresDefault A14400
Header append Cache-Control "private, proxy-revalidate, must-revalidate"
</FilesMatch>
# Force no caching for dynamic files
<FilesMatch "\.(php)$">
ExpiresDefault A0
Header set Cache-Control "no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform"
Header set Pragma "no-cache"
</FilesMatch>
############### End Caching

## Ban IP Address
#order allow,deny
#deny from 83.140.180.69
#allow from all

## Hotlinking Sites Redirect to picture
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?theb9 [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?vinylcollective\.proboards\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?giantbomb [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?dvd4arab [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?hackingtheplanetagain\.blogspot\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?montada [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?evolution-network\.ws [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?mandaver\.net [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?taringa\.net [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?jocuri-pc\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?iligan\.comoj\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?assistirfilmesgratis\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?zonaforo\.meristation\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?prime-news\.info [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?gamehall\.uol\.com\.br [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?tusjuegospc\.org [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?gratisjuegos\.org [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?thatguywiththeglasses\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?todonintendoroms\.blogspot\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?univers-de-mario\.over-blog\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?baixarjogoscompletos\.net [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?thatguywiththeglasses\.com [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ hotlinkingpicture.gif [NC,L]

RewriteRule ^(index|mainmenu)\.(html?(.*))$ http://www.example.com/ [R=301,NC,L]

RewriteRule ^Libarary(\/\'s|\'s|s|\%27s|27s)/(.*)$ http://www.example.com/Libraries/$2 [R=301,NC,L]

RewriteRule ^Libarary$ http://www.example.com/ [R=301,NC,L]

RewriteRule ^Libraries/Emulation/emulators_summary.htm$ http://www.example.com/emulators/index.php [R=301,NC,L]

RewriteRule ^Libraries/Emulation/NES/ROMs/(.*)(\.zip)$ http://www.example.com/getfile.php?file=roms/Nintendo/NES/USA/$1$2 [R=301,NC,L]

RewriteRule ^(.*)ROMs_summary.htm(.*)$ http://www.example.com/roms/index.php [R=301,NC,L]

RewriteRule ^(.*)NES_roms_summary(.*)$ http://www.example.com/roms/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^Libara....*$ http://www.example.com/roms/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^Libara...ms_summary.htm$ http://www.example.com/roms/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^Libar...*$ http://www.example.com/roms/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^Libraries/Emulation/NES/.*$ http://www.example.com/roms/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^Libraries/Emulation/snes/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Nintendo/SNES [R=301,NC,L]

RewriteRule ^Libraries/Emulation/nintendo_64/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Nintendo/N64 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/BIOS_Roms/.*$ http://www.example.com/roms/index.php?folder=BIOS-System-Boot [R=301,NC,L]

RewriteRule ^Libraries/Emulation/nintendo_gameboy/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Nintendo/Gameboy-Color [R=301,NC,L]

RewriteRule ^Libraries/Emulation/pc_boot_disks/.*(\.html?|\.zip|\.exe)$ http://www.example.com/roms/index.php?folder=PC/DOS-Boot-Disk [R=301,NC,L]

RewriteRule ^Libraries/Emulation/pc_abandonware_dos/.*$ http://www.example.com/roms/index.php?folder=PC/Abandonware-DOS [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Coleco/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Coleco [R=301,NC,L]

RewriteRule ^Libraries/Emulation/C64/.*(\.html?|\.zip|\.rar|\.txt)$ http://www.example.com/roms/index.php?folder=Commodore/C64 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/commodore_128/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Commodore/128 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Plus4/.*$ http://www.example.com/roms/index.php?folder=Commodore/Plus4-C16-C116 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/VIC-20/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Commodore/VIC20 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/sam_coupe/.*(\.html?|\.7z)$ http://www.example.com/roms/index.php?folder=MGT/Sam-Coupe [R=301,NC,L]

RewriteRule ^Libraries/Emulation/intellivision/.*(\.html?|\.int)$ http://www.example.com/roms/index.php?folder=Mattel/Intellivision [R=301,NC,L]

RewriteRule ^Libraries/Emulation/fairchild_channel_f/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Fairchild/Channel-F [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Vectrex/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=GCE/Vectrex [R=301,NC,L]

RewriteRule ^Libraries/Emulation/oric_atmos_tangerine_telestrat/.*(\.html?|\.7z)$ http://www.example.com/roms/index.php?folder=Tangerine/Oric-1-Atmos [R=301,NC,L]

RewriteRule ^Libraries/Emulation/texas_instruments_ti-994a/.*(\.html?|\.zip|\.dsk|\.bin)$ http://www.example.com/roms/index.php?folder=Texas-Instruments/TI-99-4A [R=301,NC,L]

RewriteRule ^Libraries/Emulation/watara_supervision/.*(\.html?|\.sz)$ http://www.example.com/roms/index.php?folder=Watara/Supervision [R=301,NC,L]

RewriteRule ^Libraries/Emulation/enterprise_128/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Enterprise/128 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/magnavox_odyssey2/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Magnavox/Odyssey-2 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/sega_megadrive_genesis/.*$ http://www.example.com/roms/index.php?folder=Sega/Megadrive-Genesis [R=301,NC,L]

RewriteRule ^Libraries/Emulation/sega_computer_3000/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Sega/Computer-3000 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/sega_game_1000/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Sega/Game-1000 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/sega_game_gear/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Sega/Game-Gear [R=301,NC,L]

RewriteRule ^Libraries/Emulation/SegaMasterSystem/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Sega/Master-System [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Atari2600/.*$ http://www.example.com/roms/index.php?folder=Atari/2600-VCS [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Atari5200/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Atari/5200 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Atari7800/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Atari/7800 [R=301,NC,L]

RewriteRule ^Libraries/Emulation/atari_jaguar/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Atari/Jaguar [R=301,NC,L]

RewriteRule ^Libraries/Emulation/atari_lynx/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Atari/Lynx- [R=301,NC,L]

RewriteRule ^Libraries/Emulation/bandai_wonderswan/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Bandai/Wonderswan [R=301,NC,L]

RewriteRule ^Libraries/Emulation/bandai_wonderswan_color/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Bandai/Wonderswan-Color [R=301,NC,L]

RewriteRule ^Libraries/Emulation/TurboGraphics16/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=NEC/TurboGrafx-16-PC-Engine [R=301,NC,L]

RewriteRule ^Libraries/Emulation/nintendo_game_and_watch/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Nintendo/Game-and-Watch [R=301,NC,L]

RewriteRule ^Libraries/Emulation/nintendo_famicom/.*(\.html?|\.zip|\.fds)$ http://www.example.com/roms/index.php?folder=Nintendo/Famicom-Disk-System [R=301,NC,L]

RewriteRule ^Libraries/Emulation/Spectrum/.*(\.html?|\.zip|\.tzx)$ http://www.example.com/roms/index.php?folder=Sinclair/Spectrum [R=301,NC,L]

RewriteRule ^Libraries/Emulation/nintendo_virtual_boy/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Nintendo/Virtual-Boy [R=301,NC,L]

RewriteRule ^Libraries/Emulation/acorn_archimedes/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Acorn/Archimedes [R=301,NC,L]

RewriteRule ^Libraries/Emulation/acorn_bbcmicro/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Acorn/Archimedes [R=301,NC,L]

RewriteRule ^Libraries/Emulation/amstrad_cpc/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Amstrad/CPC [R=301,NC,L]

RewriteRule ^Libraries/Emulation/atari_8bit/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=Atari/8bit [R=301,NC,L]

RewriteRule ^Libraries/Emulation/MSX/.*(\.html?|\.zip)$ http://www.example.com/roms/index.php?folder=MSX [R=301,NC,L]


RewriteRule ^classifieds/.*$ http://www.example.com/index.php [R=301,NC,L]

RewriteRule ^Libraries/Pictures/NESGameCovers/(.*)$ http://www.example.com/game-box-art-covers/index.php?folder=Nintendo/NES [R=301,NC,L]

RewriteRule ^blog/retroblog.html$ http://www.example.com/blog/index.php [R=301,NC,L]

RewriteRule ^blog/2008(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]

RewriteRule ^blog/2009(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]

RewriteRule ^blog/labels(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]


RewriteRule ^example_donations\.(htm)$ http://www.example.com/example_donations.php [R=301,NC,L]

RewriteRule ^(.*)/Retro\sRadio/RetroRadio_Main(.*)$ http://www.example.com/retro_radio/RetroRadio_Main.htm [R=301,NC,L]

RewriteRule ^TOC_Disclaimer.htm$ http://www.example.com/disclaimer.htm [R=301,NC,L]

RewriteRule ^Advertisers.htm$ http://www.example.com/advertising/advertisers.htm [R=301,NC,L]

RewriteRule ^example\sDonations.htm$ http://www.example.com/example_donations.php [R=301,NC,L]

RewriteRule ^Museum/.*$ http://www.example.com/index.php [R=301,NC,L]

[edited by: hottrout at 5:14 pm (utc) on Jun 27, 2011]

g1smd

8:30 pm on Jul 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's always more you can do.

hottrout

11:17 pm on Jul 13, 2011 (gmt 0)

10+ Year Member



Is there anything specific that I could do?

lucy24

12:35 am on Jul 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Basic housekeeping. For example (grabbing one at random):

RewriteRule ^blog/2008(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]
RewriteRule ^blog/2009(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]
RewriteRule ^blog/labels(.*)$ http://www.example.com/blog/index.php [R=301,NC,L]

Can be reduced to

RewriteRule ^blog/(200[89]|labels) http://www.example.com/blog/index.php [R=301,NC,L]

Surely you got the lecture about .*$ somewhere in this thread.

Edit: Oops. That was your first post, not the final version. ### and ### these Forums ;) But look for that kind of thing anyway. By now you know what to look for.

g1smd

12:46 am on Jul 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Never redirect to a URL that includes an index filename.

Redirect to the canonical URL that ends with a trailing slash.

hottrout

7:59 am on Jul 14, 2011 (gmt 0)

10+ Year Member



I have already done both these things and am thankful for the direction. That said the further simplification of the blog redirect to include (200[89]|labels) is a good one. Had not thought of that.

hottrout

11:19 am on Jul 25, 2011 (gmt 0)

10+ Year Member



I just wanted to update you both regarding this HTACCESS rewrite. Just this morning my domain got raised from PR2 to PR3. We are still a long way off the domains original PR of 5 but I have to suspect that the rise is down to the corrections and simplifications that you helped me make to the htaccess.

For this I just wanted to say thanks for all of your help and guidance.

I still however am having lots of malformed URL's reported in my google webmaster results and can not find the cause of them. If you find anything more out g1smd please do let me know.

hottrout

10:28 am on Aug 31, 2011 (gmt 0)

10+ Year Member



I know this will make a tripple post, but I wanted to update this thread. Over the past few days the number of Malformed URLs reported by Google Webmaster Tools has started to drop rapidly. From over 9000 for several months to just over 700 today. I fully suspect that this will fall to an expected 10-30 or so.

I post this news in order to give others hope and also show the sort of time that passed between fixing the problems with the htaccess and Google picking these issues up. I guess the point is to have patience.

Thanks again to everyone that helped.
This 37 message thread spans 2 pages: 37