Forum Moderators: phranque

Message Too Old, No Replies

Penalized for use of Alias?

apache, alias

         

pmmenneg

11:23 pm on Sep 26, 2009 (gmt 0)

10+ Year Member



So, I have a site that had a 'listings' page that was located at:

http://www.example.com/listings

With listing categories displayed as follows:

http://www.example.com/listings/looking-for/investment

So, I decided to move the URL to the following:

[listings.example.com...]

As the 'listings' page is still the engine handling the display of listings, I've added an Alias to Apache to do the following

Alias /looking-for /listings

Everything is working perfectly, and I've tried to use SEO best practices by using the 'friendly' urls (/looking-for/investments/etc/etc instead of /?catid=34).

Questions:

1) Am I going to be penalized by Google for implementing an Apache Alias? Any other suggested solutions?

2) Any comments on using listings.example.com vs. www.example.com/listings?

Thanks,

P

jdMorgan

3:54 pm on Sep 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It seems to me that you run the risk of creating 'duplicate content' with this solution, since it is essentially an internal rewrite, and allows at least two URLs to resolve to the same content.

Your best bet would be to 301-redirect from the URL that you do not want to the URL that you do want.

Jim

pmmenneg

4:20 pm on Sep 27, 2009 (gmt 0)

10+ Year Member



Hmm, true, the content is available at two locations:

1) http://www.example.com/listings/looking-for/investment

2) [listings.example.com...]

But, this is a brand new site, and there has never been a link to content using path #1 above... still think it will be found? Just to ensure that #1 isn't found, I could do a 301-reddirect from #1 to #2 maybe?

jdMorgan

9:52 pm on Sep 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Of course it will be found, and at the worst possible time. Murphy's law dictates that anything that can hurt your rankings and that has not been precluded will either occur by accident or by the action of a malicious competitor or his agent.

My all-time favorite title in our Google search forum is "Duplicate content - Get it right or perish." Take that as you wish... :)

Jim

pmmenneg

11:34 pm on Sep 27, 2009 (gmt 0)

10+ Year Member



Ok, well I'll handle this at the file level via PHP then, thought I could come up with an elegant server solution maybe with the combination mentioned above using a 301, but I don't want to even risk it.

Thanks!

pmmenneg

7:44 pm on Sep 28, 2009 (gmt 0)

10+ Year Member



Quick question... in the above example, I have apache setup to have 'listings' as the directoryindex... which I am sure millions of other sites use (directoryindex, not 'listings'. By way of example:

Many sites have duplicate content by accessing the following two urls:

http://www.example.com/
http://www.example.com/index.php

Would they not be punished as well? Essentially, I have the exact same issue, right?

Thanks!

paul

jdMorgan

10:39 pm on Sep 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They won't be 'punished' -- Search engine "penalties" are largely imaginary, and reserved for those sites which truly deserve them.

However, what you have is multiple URLs competing with each other for links and ranking, so this means the site is competing against itself -- a waste of potential ranking power. Consider that at least the following issues should be handled with a 301-Moved permanently redirect (note that these URLs all lead to one "page").

example.com/
example.com./
example.com:80/
example.com.:80/
example.com/index.php
example.com./index.php
example.com:80/index.php
example.com.:80/index.php
www.example.com/
www.example.com./
www.example.com:80/
www.example.com.:80/
www.example.com/index.php
www.example.com./index.php
www.example.com:80/index.php
www.example.com.:80/index.php

On many servers, all 16 of these URLs will resolve to the same content, and ranking may suffer up to a sixteen-to-one dilution as a result.

Shall we add a query string? How many bogus/fake parameters would you like? :o

If you allow this to happen, then it will happen -- either by accident or by malicious design.

Jim

pmmenneg

12:27 am on Sep 29, 2009 (gmt 0)

10+ Year Member



Well, I have a few rewrites working that take care of most of these issues, save for the trailing period:

- non-www is routed to www
- port 80 is stripped out

But, I can't seem to get the rewrite for the trailing period to work... I am trying to apply it to both the %{SERVER_NAME} and in a seperate rul the %{QUERY_STRING}, but no luck... any pointers? I've tried the rules posted here and for asome reason they are not working, strangely.

pmmenneg

12:28 am on Sep 29, 2009 (gmt 0)

10+ Year Member



Here is my rule that attempts to strip out non-www and trailing periods... it kills the non-www, but not the periods:

## ensure non-www requests go to www.example.com
RewriteCond %{SERVER_NAME} ^(^[www]\.)?example\.com(\.)?
RewriteRule ^(.*)$ http://www.example.com$1 [L,R=301]

jdMorgan

1:18 pm on Sep 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's not a valid pattern (two "^" characters are not permitted), and SERVER_NAME is something that *you* define and so need not be checked in this code -- You want to check HTTP_HOST instead.

Just do a negated exact-string match:


RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

That will redirect any requested hostname that is not *exactly* "www.example.com" or blank, and takes care of both FQDN-format hostnames and appended port numbers. (You don't want to redirect blank hostname requests because if you do, you will end up with an infinite redirection loop if the client never sends one -- for example, if it's a true HTTP/1.0 client or a badly-implemented 'bot. This could cause a 'self-inflicted denial of service attack' if not handled properly.)

Jim

pmmenneg

5:01 pm on Sep 29, 2009 (gmt 0)

10+ Year Member



Hi Jim, you've been very helpful, thanks!

Had a follow-up regarding re-writing URLs with querystrings, etc. Was going to post a new thread, but maybe I'll just post it here.

So, for the url
[listings.example.com...]

Essentially, I am redirecting the url via an apache ALIAS to the file listings like so:

http://www.example.com/listings/looking-for/investment

The file listings is then 'reverse engineering' the passed URL to break out the second passed item 'looking-for' as one value, and then the third value(or values ie /investment/over-100k/angel) as another value. The logic, given the fact that the third value can grow, is non-trivial, and of course if I want to change the URL structure, I have to recode the handler for the url on this page. So, I am looking for a more manageable alternative.

One solution would be to have Apache do a ReWrite so that
[listings.example.com...]

turns into
http://www.example.com/listings?sector=looking-for&type=investment

Then I can do simple get manipulation on the listing page, and can easily accommodate new variables via a simple rewrite rule change.

Problems as I see it:

1) The user, instead of seeing the nicely formatted url when they are at the particular page will see the querystringed url.

2) This does nothing to resolve duplicate content, does it? I mean, the same content is available at both urls... should my rewrite rule have a 301, even if I am, by design, trying to feed the search engines this first formatted url?

Thanks

jdMorgan

11:46 pm on Sep 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> 1) The user, instead of seeing the nicely formatted url when they are at the particular page will see the querystringed url.

They won't see the querystring, because your doing a URL-to-filepath rewrite, not a redirect.

It might be simpler to pass *all* requests to the 'listings' script, and then let it look at Request_URI and get the parameters it needs. While mod_rewite is handy, there's no reason to 'split' your URL-handling into two complicated pieces. Since PHP can handle complication more easily, keep the mod_rewrite side simple.

Separately, if any URL that you do not want to accept arrives at your server, either reject it with a 403 or redirect it to the correct URL with a 301.

Jim

g1smd

1:15 am on Sep 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A rewrite does not make a new URL. A rewrite accepts an external URL request, and then targets an internal filepath.

A redirect tells the browser to make a new request for a new URL.

That's the difference between a redirect and a rewrite.

pmmenneg

2:49 am on Sep 30, 2009 (gmt 0)

10+ Year Member



Ah, got it guys, thanks. One final question, now that I am playing around a bit with rewrites.

Two basic questions:

1) Trying to redirect all traffic with nothing after the url (have tried all of the following and none are working):
RewriteRule ^/$ /goto?this=defaultvalue [L]
RewriteRule ^/[^.]$ /goto?this=defaultvalue [L]

2) Trying to eliminate multiple slashes from working for this config:
RewriteRule ^/([a-zA-Z\-]*)$ /goto?this=$1 [L]

i.e. both of these work, where I'd like only the first to work
example.com/partytime
example.com///partytime

I know these are really basic questions, but I've got most of the main rewrite stuff working, just these 'boundary cases' that are tripping me up.

Thanks again for your help!

g1smd

7:20 am on Sep 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Although you call them redirects, they are both actually rewrites. As they both target internal filepaths, the user will not see a new URL.

You need to change the first one to be a redirect, by adding a domain name to the target (making it a URL instead of an internal filepath) and by changing [L] to instead be [R=301,L].

For your second problem, you could add a redirect before this rewrite, to redirect incorrect requests to the right form, or you could fix the rewrite to only accept certain formats. I would look at doing a prior redirect.

jdMorgan

6:12 pm on Sep 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



2) Trying to eliminate multiple slashes from working for this config:
RewriteRule ^/([a-zA-Z\-]*)$ /goto?this=$1 [L]

This is caused by the 'automatic cleanup' that Apache does before doing most of its internal URL-handling, and therefore you cannot stop that rule from working in and of itself, because it will never 'see' the consecutive slashes.

To recognize and redirect such requests, you need a separate redirect that looks at the actual request line sent by the client. Something like this in your config file:


# Externally redirect to remove multiple slashes at start of URL-path:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ //+([^\ ]*)\ HTTP/
RewriteRule ^/ http://www.example.com/%1 [R=301,L]

That will detect and redirect any requested URL-path that starts with two or more consecutive slashes.

THE_REQUEST is the request line sent by the client, and usually appears as a quoted string in your raw server access log, e.g.

"GET /partytime HTTP/1.1"

Note for other readers: The code in this thread is intended for use in server config files, outside of any <Directory> containers. For use in .htaccess or inside a <Directory> container, it will be necessary to remove the leading slashes from the RewriteRule patterns.

Jim

pmmenneg

8:40 pm on Sep 30, 2009 (gmt 0)

10+ Year Member



Thanks guys, worked like a charm!