Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Indexed 2 ways by search engines - any problem ?

         

Vrindavan

3:06 am on Jul 9, 2010 (gmt 0)

10+ Year Member



If other sites are linking back to you in two ways as follow,
what would be the result in terms of SEO ?

Also , is it easier for SEs to index your page if your link is look like "a" ? or just the same ?

Thanks for sharing

a)
www . abc . com/h/index.htm

b)
www . abc . com/h/

AnkitMaheshwari

7:19 am on Jul 9, 2010 (gmt 0)

10+ Year Member



You should ideally have a preferred URL setup in the htaccess file (301 type) to resolve only one type of URL (a or b) which you prefer. This would help in concentrating the juice of all incoming links to one URL itself.

Robert Charlton

8:22 am on Jul 9, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There are several considerations here...

Indexed 2 ways by search engines - any problem?

You don't want any of your pages showing up for more than one URL. If they do, Google will index all variants and treat them as duplicate content.

Since Google doesn't like duplicate content, it will generally display only the URL with highest PageRank and will filter out all the others. It may not be the version that you've chosen to promote.

If different inbound links point to different URLs, the multiple URL situation will also result in a split link vote, ultimately costing you rankings... and it may also result in fewer pages on your site crawled by Google

As Ankit suggests, you need to select a preferred form for your urls and to eliminate the alternatives. The only dependable way to do this is by proper server setup.

If the choice here were limited to (a) or (b), I'd recommend (b), ie the version without index.html.

For more on this, take a look at the Hot Topics [webmasterworld.com] section, pinned to the top of the Google SEO forum home page, and look at the section on "Duplicate Content". In particular, this thread applies to your question...

Domain Root vs. index.html [webmasterworld.com] - yet another kind of duplicate

I recommend reading the whole Duplicate Content section, since if this is currently a problem, you will inevitably have other dupe content issues as well.

And in the case of your question, I wouldn't stop just with getting rid of "index.html". I'd also get rid of the extra directory you have... "/h/".

You really want the default URL for your domain to resolve to...

www . abc . com/

That's where inbound links will naturally be directed, and you need to set up your preferred or "canonical" default domain variant to take advantage of that.

[edited by: Robert_Charlton at 8:31 am (utc) on Jul 9, 2010]

Vrindavan

8:30 am on Jul 9, 2010 (gmt 0)

10+ Year Member



>> I'd also get rid of the extra directory you have... "/h/".

those refer to sub-directories of a domain, they cannot be removed.

Robert Charlton

8:40 am on Jul 9, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Are you talking about a default page for entire site, which is your site, and which is what I'd assumed...?

...or are you talking about simply the default page on a directory of site, say a space that has been assigned to you?

In part, I ask because if you don't have control of the server space for the entire site, the "canonical" fixes suggested may be out of your control. In that case, I would link to the directory root with a "/" or with an full url, dropping "index.html"... and I'd use Google's canonical tag... something I ordinarily wouldn't depend on, but it's liable to be your only choice.

Ideally, you should control the domain, and it should be on a server that will enable you to use mod_rewrite (assuming here an Apache server).

In either case, I suggest you get rid of "index.html".

Vrindavan

9:19 am on Jul 9, 2010 (gmt 0)

10+ Year Member



>> Are you talking about a default page for entire site

yes

the problem will happen on
- root domain part
- plus all the sub folders of that root domain

if search engines had indexed all my files using the with index.htm version already, should i still go ahead to change my own site linking structure to without index.htm version ? Or just leave it alone ? (i have not read the suggested reading yet, so i am not sure why without index.htm is preferable? )


The last step, then is how to make google consider all other domains' link back to my site in either way(with or without htm ) is actually refer to the same page ?

If Google indexed or included both versions, both urls may appear on search result at the same time ? Or Google will just always use one version and ignore the other version ?

Vrindavan

9:46 am on Jul 9, 2010 (gmt 0)

10+ Year Member



If i have more than one domain under my hosting account, is that i have to add the following code multiple times in the .htaccess file to cover all domains?


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.htm?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.htm?$ http://www.example.com/$1 [R=301,L]


Beware that posting in the forum modifies the ¦ pipe ¦ characters so you will need to change those back. <--- sorry, i don't quite understand which one need to modify ?


Thanks so much for help.

g1smd

6:19 pm on Jul 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's no pipe symbols to modify in that particular code.

If that code is for multiple domains, then you'll need to modify it slightly, otherwise all requests will be redirected to the single domain targeted in the very first rule of them all.


Link to root and folder URLs ending only with a slash. Don't include the index file filename in the link.

Redirect, as above, from named file to slash-ended URL.

If multiple URLs show the same content, Google prefers the shorter URL. It's not all about PageRank, but that's also a factor sometimes.

Not including your "technology" in the URL, e.g. index.html or index.php allows you to change the technology behind the website, without having to change your URLs ever again.

Likewise for extensionless URLs for your inner content pages. They allow the website technology to chnage behind the scenes without affecting the URLs used to access that content.

TheMadScientist

6:44 pm on Jul 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.htm?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.htm?$ http://www.example.com/$1 [R=301,L]


Beware that posting in the forum modifies the ¦ pipe ¦ characters so you will need to change those back. <--- sorry, i don't quite understand which one need to modify ?


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?[^\ ]*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.example.com/$1 [R=301,L]

### OR ###

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]*/)*)index\. http://www.example.com/$1 [R=301,L]


No need to modify the | character in either of the preceding...

The first should redirect /index.htm or index.html to the root of the directory (the-trailing-slash-only/), which is fine if you only use .htm or .html extensions... The second, by leaving the Rule unclosed (which implies to match anything following the dot) will redirect index.anything back to the root of the directory.

I also removed the following bold (\?[^\ ]*) because:

1. [^\ ] = 'is not a space' and a \? (literal question mark) 'is not a space' so, it is a 'match' to the pattern of 'is not a space' and is unnecessary for the condition in this situation. Plus, the rule in the first set where this could matter more already checks to see if the htm / html is followed by another character since it is anchored at the end... if there is a character besides a ? following the m or l the rule will not match and the redirect will fail... The second ruleset is more of a 'catch all' for index.anything (and I mean index.AnYth1NG?could=goHERE), so it may not work for everyone on all sites.

2. The grouping () is also a 'capturing' pattern in mod_rewrite, and it's not used back-referenced anywhere (to access it %2 would need to be used) so it's unnecessary since the pattern [^\ ]* will match correctly to the space in THE_REQUEST with or without the grouping and the Rule / Condition still works without the back-reference.

My guess is my modification is a modification of the modification of a jdMorgan original, and for more information on 'exactly what everything does' I recommend the Apache Forum [webmasterworld.com].

NOTE: If you don't really know what you are doing, I do highly recommend visiting the Apache Forum here, because mod_rewrite is definitely cool and powerful, but it's also not a toy and can totally break your site and have some really undesired, unexpected results if you don't know what everything does and don't 'get it right' for your specific site.

g1smd

7:52 pm on Jul 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, there is no "get it 99% right" with mod_rewrite.

The 1% "wrong" can literally put your site out of business.

Vrindavan

12:50 am on Jul 10, 2010 (gmt 0)

10+ Year Member



>> Redirect, as above, from named file to slash-ended URL.

one more question
which of the following should i use ?

a) abc .com or
b) abc .com/

a) abc .com/h or
b) abc .com/h/

>> Not including your "technology" in the URL, e.g. index.html or index.php allows you to change the technology behind the website, without having to change your URLs ever again.

>> They allow the website technology to chnage behind the scenes without affecting the URLs used to access that content.

i think you mean all the file name called "index",
it does not work if the file names are
.apple.htm to .apple.php
.lemon.htm to .lemon.php


>> If that code is for multiple domains, then you'll need to modify it slightly,

Can you help me to post a code that can accomodate multiple domains under one hosting account ?

TheMadScientist

1:01 am on Jul 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a) abc .com or
b) abc .com/

Always / for the root domain.

a) abc .com/h or
b) abc .com/h/

I like no slash personally, but there are those who will say it's incorrect and others who will say it's just a waste of a transfer bandwith, so do what you think is best... If you're not experienced and don't have all your links as server relative (meaning the start with a /path-to/the-file.ext) then you should probably stick with a trailing / for ease of use and to keep from breaking directory relative (do not start with a / EG path-to/the-file.ext) links throughout your site.

i think you mean all the file name called "index",
it does not work if the file names are
.apple.htm to .apple.php
.lemon.htm to .lemon.php

It's a totally different code that does anything with those pages, and the best place for that discussion is the Apache Forum, because it's site and 'what exactly do you want it to do' specific.

Can you help me to post a code that can accomodate multiple domains under one hosting account ?

Again, on this one IMO the best place to find, try, discuss the modifications is the Apache Forum, because you'll generally get more thorough answers and be able to ask some really specific questions if necessary...

Again, mod_rewrite is cool and really powerful, but that comes with a distinct disadvantage in that you should know how to use it and exactly what everything does before you install it on your site, because if you install it and break something or need to make a change later you need to know how to fix it, IMO of course.