homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Meta tag Rel="canonical" & double content
How to use this?
textex




msg:4455164
 9:44 pm on May 18, 2012 (gmt 0)

I've read to put this tag on dup pages you don't want indexed. If BING is indexing the https and non-www version of certain pages, should I add this tag on the http://www.example.com/ page since the others don't exist (don't know how or why they are being picked up).

 

g1smd




msg:4455183
 12:11 am on May 19, 2012 (gmt 0)

No. Redirect the other versions to the canonical form. Make sure that for any unwanted request, the user then arrives at the correct URL for that content after exactly one redirect action and not a chain of more than one redirects.

textex




msg:4455205
 3:14 am on May 19, 2012 (gmt 0)

How do I redirect them if they do not exist?

lucy24




msg:4455258
 9:10 am on May 19, 2012 (gmt 0)

Exactly the same as if they did exist. The server neither knows nor cares, unless you have explicitly asked it to check whether a file exists. (The notorious !-f and !-d found in every boilerplate htaccess ever distributed.)

Now, we're really supposed to make you either figure it out for yourself or find one of the 62,000 earlier threads, but...

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

:: fingers crossed ::

This goes after all other redirects in your htaccess. Why? Because all those others will already say http://www.example.com as part of their targets. This final generic redirect is only to pick up the leftovers.

^(www\.example\.com)?$

means "exactly www.example.com or exactly nothing". The "nothing" is to allow for HTTP/1.0 requests. The leading ! means "if the requested domain is not exactly" et cetera.

g1smd




msg:4455375
 5:26 pm on May 19, 2012 (gmt 0)

Redirects don't look at files on the server. A RewriteRule configured as a redirect merely looks at what URL was requested by the user or bot and sends a response back to that user to suggest they make a new request for a different URL.

textex




msg:4455379
 6:12 pm on May 19, 2012 (gmt 0)

I have the code Lucy24 provides in my htaccess. But BING is not recognizing it. That's the problem...

g1smd




msg:4455389
 7:28 pm on May 19, 2012 (gmt 0)

How do you know that Bing don't recognise it?

What do your server logs say?

textex




msg:4455393
 7:42 pm on May 19, 2012 (gmt 0)

Bc bing shows https and the non-www version indexed in WMT tools.

g1smd




msg:4455398
 8:15 pm on May 19, 2012 (gmt 0)

If they are anything like Google it can take three months or more for the data to catch up with reality.

[edited by: g1smd at 8:30 pm (utc) on May 19, 2012]

textex




msg:4455402
 8:28 pm on May 19, 2012 (gmt 0)

Frustrating b/c these htaccess files have been in place for years.

g1smd




msg:4455403
 8:31 pm on May 19, 2012 (gmt 0)

You should check the headers very carefully for errors for a variety of URL requests.

textex




msg:4455405
 8:40 pm on May 19, 2012 (gmt 0)

Is there a way to run through the entire site and check headers of all pages instead of checking one at a time?

g1smd




msg:4455407
 9:11 pm on May 19, 2012 (gmt 0)

Xenu Linksleuth might be useful (Windows).

textex




msg:4455411
 9:21 pm on May 19, 2012 (gmt 0)

I just ran my homepages through http header checker and my htaccess is not working properly. I am getting a 200 response for https://www.mysite.com. I tried stickying you g1smd to see if I can contract you to help me fix this. But your box is full.

lucy24




msg:4455453
 1:59 am on May 20, 2012 (gmt 0)

Oh, wait. https versus http is a different issue. It isn't covered by %{HTTP_HOST}. For this you need still another line: one looking at the protocol.

Please be assured that nobody-- not even Bing, no, not even google!-- can ignore htaccess. You can look in your raw logs and see them getting 301 or 403. But they don't instantly remove something from their index just because they can't get to it.

g1's mailbox is always full. Been full for years ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved