Forum Moderators: phranque

Message Too Old, No Replies

httacess rewrite problem

duplicate content issue

         

mark_mesh

7:59 pm on Sep 20, 2011 (gmt 0)

10+ Year Member



Hi Guys,

Hopefully somebody can help, Ive trawled through many pages via google etc and tried countless pieces of code but am unable to achieve what i`m trying to do....

webmaster tools in google are showing many errors for duplicate content like these which all do not exist or shouldnt.

/index.php/data/data/content/media.php?g=media-20
/index.php/data/content/media.php?g=media-78
/index.php/data/page2.html
/index.php/data/data/video.html
/index.php/data/banner.swf


What i am trying to do is:
If index.php is requested then strip anything after index.php and 301 redirect it to index.php.
(i`m not sure if the redirect is even nesacerry after the trailing rubbish is stripped off?)

I hope someone could possible help its driving me nuts for the last two days.

Many thanks in advance.

lucy24

8:26 pm on Sep 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If they don't exist, where is gwt getting its information about content? It should be listing everything as 404. But yes, a redirect is the only way to get everyone on the same page (literally ;)) and dump the duplicate content.

You actually need to do two things. (Obvious analogy: If your child falls down the stairs and breaks a leg, you have to both take them to the hospital and install a barrier. But one is more urgent than the other!)

Step one is a simple redirect:

RewriteRule ^index\.php.+ http://www.example.com/index.php [R=301,L]

which simply means "If there is any stuff whatsoever after 'index.php', redirect to 'index.php' alone." By default, rewrites do not touch the query string, so it will be unchanged. If that part is also garbage and/or nonexistent, change the target to index.php? with final question mark. This strips the entire query.

Step two is to find out where these spurious links are coming from. If it's in your power to get rid of them, do so.

mark_mesh

8:57 pm on Sep 20, 2011 (gmt 0)

10+ Year Member



OMG... Thank you so much lucy!

Thats got him patched up :), Now i just need to find out whats causing it.

Just had a look through my 404 errors and they are not on there which is strange as there are over 30 of the duplicate content urls that the code has just fixed.

All the examples i gave simple load the main index/home page as normal but with all the trailing info in the explorer bar.

Quick question, Would this work to redirect a directory to the home page?

RewriteRule ^\DIRECTORY\.+ http://www.example.com/? [R=301,L]

So any request for http://www.example.com/directory/another-directory/media-display.php

So basically as long as the url has that directory in it then redirect to home page.

Cant thank you enough for the redirect code though.

g1smd

9:53 pm on Sep 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your query string stripping redirect should be something like this:

RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php http://www.example.com/$1? [R=301,L]



^\DIRECTORY
matches a digit followed by IRECTORY. It does not match the letter D at all.

^\DIRECTORY\.+
requires there be a period after the folder name. That's the \. in the code. The + means one or more.

So
^\DIRECTORY\.+
would match "3IRECTORY......." where "..." is LITERAL periods.

^index\.php.+ which simply means "If there is any stuff whatsoever after 'index.php'
No, not "anything whatsoever". Only more path stuff. It will not match any type of appended query string data.

lucy24

11:25 pm on Sep 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



\DIRECTORY matches a digit followed by IRECTORY. It does not match the letter D at all.

Apache isn't case-sensitive?! In vanilla RegEx \D would match any non-digit. Anyway, I think we're dealing with \ for / typos. Lethal in htaccess but only mildly annoying in message boards ;)

RewriteRule ^index\.php http://www.example.com/$1?

Your cat walked across the keyboard, didn't she?

No, not "anything whatsoever". Only more path stuff. It will not match any type of appended query string data.

Hence the further blahblah about query strings :P

g1smd

11:36 pm on Sep 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, \D is "non-digit"; I overlooked that. The point was supposed to be that the added unwanted escaping changes the meaning to something that was not intended.

And disregard the $1 on the end of the example. It was empty anyway.

mark_mesh

4:11 pm on Sep 21, 2011 (gmt 0)

10+ Year Member



Ah!

So i could simply redirect a whole directory and any file in it with this expresion then if i`m not mistaken... which i often am :)

RewriteRule /?foldername/ http://www.example.com [R=301,L]

Thanks again for your input guys.

g1smd

4:46 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What is the leading /? supposed to do?

The target URL needs a trailing slash.

mark_mesh

5:13 pm on Sep 21, 2011 (gmt 0)

10+ Year Member



Hey g1,

as you can tell i`m no proffesional :(
I put the / in as if i didn`t i got an enternal 500 error.

However with the / in site loaded fine and all areas of site seem to function correctly as well as it does the redirect as required.

Is this not a good method?

edit:

OH having a moment!... I see what you mean /? not just /
I thought ? denoted all content.

Now have:

RewriteRule foldername/ http://www.example.com/ [R=301,L]

Hopefully good.

g1smd

5:55 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With no start anchor, all of these will redirect to root.

www.example.com/foldername/
www.example.com/otherfoldername/
www.example.com/some/other/deep/foldername/


Be sure that is really what you want. If not, simply add the ^start anchor to the pattern.

mark_mesh

6:10 pm on Sep 21, 2011 (gmt 0)

10+ Year Member



g1,

I`m confused by the last post.

What i want is any requests for "foldername" to be redirected to http://www.example.com

ie:

http://www.example.com/foldername/videos.php
http://www.example.com/foldername/banner.swf
http://www.example.com/foldername/anotherfolder/media.php

all of the above would redirect to http://www.example.com

Thats what i am trying to do and what seems to be happening.
Is the code i have correct for this?

lucy24

7:07 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Stepping back a couple of posts: ? in the pattern means "the immediately preceding character is optional". If there is no immediately preceding character, 500 sounds like about the right number of errors. :)

g1's point is that the string "foldername" may occur in other places, such as

www.example.com/otherfoldername/

("foldername" is the last part of a longer name) or

www.example.com/some/other/deep/foldername/

(you've got another "foldername" elsewhere on your site, buried within other directories)

If you're absolutely certain that neither of these will ever occur, the opening anchor ^ will make no functional difference-- except that the server's computer has to check each request all the way through to the end to make sure "foldername" never occurs. With the anchor

^foldername

the computer only has to check the very beginning. Name of top-level requested folder doesn't start with f? Stop right there and move on to next rule. Second letter isn't o? Stop right there, et cetera.

g1smd

7:51 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Exactly.

foldername/ - means it redirects when found anywhere in the requested path.

^foldername/ - means it redirects only when found at the beginning of the path.

mark_mesh

8:52 pm on Sep 21, 2011 (gmt 0)

10+ Year Member



crystal clear now, thanks guys.

lol, 500 errors is a bit harsh miss lucy :)

I am 100% certain that the folder name doesn`t and will not exist anywhere else on the server so it should be good to go now.

thank you both.

g1smd

8:55 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the folder to redirect is always in the root then adding the ^start anchor to the pattern is a good idea.

The rule will also run a lot faster as explained above.