Forum Moderators: phranque
The page with URL "http://www.example.com/" can also be accessed by using URL "http://www.example.com/index.htm".
Search engines identify unique pages by using URLs. When a single page can be accessed by using any one of multiple URLs, a search engine assumes that there are multiple unique pages. Use a single URL to reference a page to prevent dilution of page relevance. You can prevent dilution by following a standard URL format.
I have tried to address this by adding this to the .htaccess:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ http://www.example.com/$1 [R=301,L]
Yet Bing is still saying I have that canonical issue with index.html
Here is what my entire htaccess looks like:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ http://www.example.com/$1 [R=301,L]
thanks for any tips!
The index redirect, being more specific, must be listed before non-www to www canonical redirect. That will fix the issue.
Your index redirect can be coded a bit more efficiently. The .* part should be replaced with a better pattern. Luckily the code has been posted hundreds of times in this forum.
Is the report a 'live' report? I wouldn't think so. I would allow at least a week or more for the status to update.
Thank you for your response. What you would suggest for a better pattern over the "*"? I have reversed the code:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.htm\ HTTP/
RewriteRule ^(.*)index\.htm$ http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
Also, this Bing SEO scan is some new program that you install to your computer,and then it does a real time scan of your site looking for "SEO" errors. I found out about this app here at webmasterworld the other day.
I am assuming that since I can make changes and then rescan my site live, any changes should show up with the next batch of results.
[edited by: Boulder90 at 1:41 am (utc) on Jan. 28, 2010]
RewriteEngine On
#
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.htm\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [NC,R=301,L]
#
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com¦192\.168\.0\.2)?$
RewriteRule ^ - [F]
If you do have a dedicated IP address, then put it into the third rule's RewriteCond with the literal periods escaped as shown above. Replace the broken pipe "¦" character with a solid pipe character before use; Posting on this forum modifies the pipe characters.
I'm not sure why (specifically) you said "Wow" but that ".htm error-report versus .html pattern" issue is a good example of something we repeat fairly often around here: mod_rewrite is utterly unforgiving, and one little typo can cause a rule not to work or effectively knock your server offline -- or worse, it can slowly and quietly eat away at your search rankings due to some unexpected and hard-to-detect side-effect. So intense concentration and attention to detail is critical.
Jim
The code will redirect to canonicalize any non-canonical request which *does* include the correct 'base' domain, and reject all other requests unless the request is by IP address or the HTTP Host request header is blank (as it will be for true HTTP/1.0 requests). Since named-based virtual hosts cannot be accessed by true HTTP/1.0 clients (because they require the Hosts header to work) or by IP address (because it is shared among many sites), these provisions aren't needed if you don't have a dedicated IP. (Read this carefully, it's a rather complex statement).
For named-based servers, with the lines I mentioned omitted, the code simply reverts to saying, "If the requested hostname isn't exactly "www.example.com", then redirect to www.example.com".
Jim
I had a quick question for the readers on if this is an appropriate robots.txt file:
User-agent: *
Disallow:
Disallow: /forums/index.php?action=help*
Disallow: /forums/index.php?action=search*
Disallow: /forums/index.php?action=login*
Disallow: /forums/index.php?action=register*
Disallow: /forums/index.php?action=admin*
Disallow: /forums/index.php?action=post*
Disallow: /forums/index.php?action=who*
disallow: /forums/index.php?action=printpage
I've had a big problem with getting my pages in Google's non-supp lately, even with really good content. Not sure what is going on.
One of the often-overlooked factors in this is that each page needs a *unique* title and description, and both must be relevant to the pages' contents.
There are hundreds of other on- and off-page ranking factors, of course, but this one seems to get overlooked fairly often.
Note that query string and wild-card robots.txt Disallows are not supported by all search engines. It would be a good idea to check all of the robots that are important to you -- visiting their "Webmaster info" pages, and verifying that they support these extensions to the Standard for Robot Exclusion. You may need to add explicit policy records in your robots.txt file for those that do not support these extensions.
Jim
My commercial site is outdoors related, and it covers national parks and forests. I then break it down into camping, fishing ,etc. So each page is not going to be exactly its own thing. For example, my page title would be something like "san juan national forest camping", wth meta descriptions about that, and then going to the next page it would be "san juan national forest fishing" as the title and a meta decription about that. Another section on my site would cover say the Roosevelt national forest. Page titles would be "roosevelt national forest camping". They are unique and different content, and I add the national forest part because if I just put "camping" as the title, then I would have tons of pages with the same title. It's all different content though. Google is killing me. It doesn't seem to be a problem for Bing though which has 80% of my images indexed and 80% of my pages in their non-supp. Google? 25 images indexed and 50 pages in the non-supp. Scary. I'm not copying anyone elses content like my competitors, it's all super unique, my own writing, research and site specific images. Despite releasing over 70 new pages of hard fought content the past month, Google has added nothing to it's non-supps from my site. It's frustrating.
Thx for the tip on the wild-card.
Descriptions should be written in a manner that is as far from boiler-plate as possible. Change the words, change the word order, change the sentence order, change the 'tone' -- especially between pages with similar titles and content. "San Juan National Forest Camping" vs. "Camping facilities in San Juan National Forest" or "Campsites in San Juan National Forest," for example. Explore the "keyword spaces" that you have available to you, and change them up!
Your logs and stats can be quite useful here, as the search phrases of visitors landing on your site can and do vary -- and can inform your titling and description-writing decisions.
To be clear, if you're wondering whether I'm recommending that you manually compose unique titles and descriptions for each and every page, the answer is "Yes." If this sounds like too much, then pick a section of your site that you're having trouble with, and try it on a limited basis. If it works to pop those pages out of limbo, then you can decide whether it's worth doing on a wider scale. If not, then at least you'll know that some other factor is likely more important in keeping those pages from performing.
If you're only waiting a month to evaluate new pages' rankings, that's not long enough. Although G returns results almost instantaneously, that's not the case for indexing and ranking updates. The time required to get a page indexed and rank will vary according to your current pages' ranking and the nature and effectiveness of your on-site linking strategy, but 30 days is only long enough if you're a top-ranked site.
70 pages of new content per month also raises a flag: Make sure these pages are not "thin" with fewer than six full paragraphs of information -- Six is not a magic number here, I'm just trying to delineate what I mean by "thin." If your pages are thin, then either fatten them up a bit (with more useful, unique info), or consider combining multiple smaller pages into fewer larger pages based on region, activity, and type of facility (park, forest, monument, trail, etc.).
Also, do be sure that for any given 'page' of content of your site, it can be reached with one and only one canonical URL, with all variations in protocol, domain, subdomain, URL-path, and query strings 301-redirected to that one unique URL. Otherwise, you have the classic "duplicate content" situation, and your multiple URLs will compete with each other for links, traffic, and ranking.
You may read of "duplicate-content penalties" here and in other Webmaster/SEO-related forums. Except in the most egregious cases of intentional content-duplication, there is scant evidence for any actual penalties imposed by search engines, but the self-competing described in the previous paragraph can indeed be self-defeating.
I'll commend the Google Search forum, its library, and the list of threads pinned at its top to you for more information on supplementals, indexing/ranking cycles, and optimization of on- and off-page ranking factors.
Jim
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([b][^/][/b]+/)*index\.htm\ HTTP/ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([b][^/\ ][/b]+/)*index\.htm\ HTTP/ [NC]