Forum Moderators: phranque
I have removed the hack from my site and reconfigured it to run in pretty much the default mode. I didn't give any thought to the impact that might have on the search engine index but found out soon enough that a couple of thousand 404 error messages resulted in that "minor" change, LOL.
I am thinking that I can use the following technique to tell the search engines that the URLs are gone (about 900 of them)total:
RewriteCond %{REQUEST_URI} ^/nogood.html$ [NC]
RewriteRule .* - [G,L]
Has anyone done this? I am thinking that it will remove them from the index immediately instead of waiting weeks or months for the search engine to drop the bad links.
Appreciate any feedback you might have on this or ideas to solve this problem.
TIA
Your purpose might be better served by creating 301 redirects from the old page names to the new ones [R=301,L]. It has a better chance (better than none, anyway) of crediting the new pages with the Pagerank earned by the old ones. (I don't know; someone more knowledgable can comment on whether that's a certainty.) But it might be more trouble than it's worth to do that for 900 pages unless they were very popular.
...except, having said that,... my first attempt, before the [G,L] was to do a 301, and that did not stop search engines from crawling the original page URL. After waiting a couple of months and seeing that the page was still being crawled, that's when I switched to [G], and it stopped them immediately.
[edited by: SteveWh at 10:38 am (utc) on April 22, 2009]
RewriteRule ^nogood\.html$ - [G]
For URLs which *do* have a direct or reasonably-relevant replacement, use a 301, by all means. Even though you mark URLs as "404-missing" or "410-Gone," the search engines will continue to ask for them as long as they find links to them on the Web (and quite a bit longer, too -- sometimes, for years), and you are throwing away all the backlinks to the old URLs and their associated PageRank/Link-popularity if you use a 4xx response.
RewriteRule ^nogood\.html$ http://www.example.com/good.html [R=301,L]
RewriteRule ^widgets-([a-z]+)\.html$ http://www.example.com/new-widgets-$1.html [R=301,L]
RewriteRule ^widgets-(blue¦green)\.html$ http://www.example.com/new-widgets-$1.html [R=301,L]
Jim
I just checked site: on Google and found about 500 links to my site and most of the are 404 because the contain links like www.mysite.com/graphics-design/ when the actual URL is www.mysite.com/forumdisplay.php?f=16. Unfortunately, every one of them would need to be looked up to see what the real URL is so that a redirect can be constructed.
This search engine friendly URL rewriting seemed like a good idea at the time but its starting to look like it was a bad idea in lieu of the fact that I am reading now that search engines don't real care about "readable" URL and the advantage, if one exists, is in the "user friendly" plain English URL.
I don't see a way to "capture" all the broken URLs that are indexed now either though I think it might be the same data that Google Webmaster Tools is reporting as 404.
So, the real question now is what real advantage do I get from doing manual redirects versus page gone?
TIA for any ideas about this (quick fix would be great, lol).
rewrite instruction isn't working for some reason?
Here is what I am doing per your suggestion. The URL that is in google is:
[mysite.com...]
I added to my .htaccess file:
RewriteRule ^http://www.mysite.com/debating-room/$ [mysite.com...] [R=301,L]
Comes back with 404 like it wasn't even there.
Any ideas?
You don't need a separate RewriteCond. Just put the localized URL-path (i.e. no leading slash) into the RewriteRule itself:
I don't like to repeat myself, but that statement was very specific. Use a localized URL-path --not a protocol plus canonical URL-- in the RewriteRule pattern.
RewriteRule ^debating-room/$ http://www.example.com/forumdisplay.php?f=78 [R=301,L]
Believe it or not, I am almost finished redirecting all the 404s on my site. One issue keeps showing up on the googlebot.
For example my redirect is:
RewriteRule ^free-mods/$ forumdisplay.php?f=57 [R=301,L]
But the googlebot is also trying to find free-mods without the forward slash and gets a 404. I have just been adding a statement but should I be able to add it automatically if its missing when the googlebot queries or just continue adding another rule?
Based on the previous discussion in this thread, I believe you could save yourself a lot of wasted time and frustration by spending some time with a print-out of this document [httpd.apache.org] and the regular-expressions tutorial cited in our Forum Charter. These are fundamental to proper implementation and trouble-free use of mod_rewrite rules. They are basic Webmaster kit.
Jim
RewriteRule ^free-mods/[b]?[/b]$ forumdisplay.php?f=57 [R=301,L] would redirect requests either with or without the trailing slash.
However, do check what happens when you request a non-www URL and what happens when you request a www URL.
Yep, you end up on the same sub-domain that you started on - unless you have another rule that then fixes that in a separate redirect.
Both 'not fixing it' and 'fixing it with a second redirect' are non-optimum.
When you redirect you need to include the full target domain name in the redirect:
RewriteRule ^free-mods/[b]?[/b]$ [b]http://www.example.com/[/b]forumdisplay.php?f=57 [R=301,L]