homepage Welcome to WebmasterWorld Guest from 54.227.160.102
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 48 message thread spans 2 pages: 48 ( [1] 2 > >     
301 Redirect is NOT removing pages from index
zerillos

5+ Year Member



 
Msg#: 4375906 posted 12:47 pm on Oct 18, 2011 (gmt 0)

I just 301 redirected a bunch of pages I wanted out of G's index to the home page. G spidered them, but did not remove the URLs from the index. Instead, now when I check the instant preview of the redirected URLs, they display the content of the home page.

I don't think this is the intended behavior of 301 redirects. Am I wrong? I though G was supposed to remove the URLs from the index.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 6:53 pm on Oct 18, 2011 (gmt 0)

They will, but it will take a few weeks.

Imagine you were a searcher and you saw those URLs in the search results last week. If the URLs were missing this week, you might think Google was broken. So, the URLs stay listed for a short while and then they're gone.

MikeNoLastName

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 7:50 pm on Dec 12, 2011 (gmt 0)

We have the same problem, except they were redirected over a year ago, went away, and now they've been back for months (with the content of the home page, just like the other poster). And we're getting a serious duplication penalty from it because they are pointing to our home page.
Checked Fetch as Googlebot. Checked logs. Googlebot keeps crawling them twice a day, gets 301 code's, and still leaves them in the index.
Google is just plain broken, and we're paying for it!

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 8:03 pm on Dec 12, 2011 (gmt 0)

And we're getting a serious duplication penalty from it because they are pointing to our home page.

Google warns about mass redirecting old pages to the root home page. It's rarely a good idea.

Content_ed

5+ Year Member



 
Msg#: 4375906 posted 8:17 pm on Dec 12, 2011 (gmt 0)

Maybe if you drop the 301's or point them all somewhere else it will help. If you're sure they're hurting rather than helping, you could be better off with a good 404 page.

bwnbwn

WebmasterWorld Senior Member bwnbwn us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4375906 posted 8:25 pm on Dec 12, 2011 (gmt 0)

content_ed is correct. If the page is not replaced redirecting back to the root is a bad idea and one that should be avoided. It is better it go to a custom 404 page so G gets the idea and it is gone. 404 them and be done with the pages.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 9:43 pm on Dec 12, 2011 (gmt 0)

Using a 404 is good.

For "Gone" content, returning a 410 response is an even more clear signal.

bwnbwn

WebmasterWorld Senior Member bwnbwn us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4375906 posted 9:54 pm on Dec 12, 2011 (gmt 0)

g1smd correct I just like to send the user to a custom 404 page so there might be a possible sale or lead from it. I guess I could build a custom 410 page as well really never thought about that.

Question not trying to hyjack the thread it does go with the flow. Who in here does have a custom 410 page built to direct a visitor from indexed links until it has been removed from the SE's index?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 10:01 pm on Dec 12, 2011 (gmt 0)

What do you mean by the words "send the user to a..."?

bwnbwn

WebmasterWorld Senior Member bwnbwn us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4375906 posted 10:25 pm on Dec 12, 2011 (gmt 0)

user to me is a visitor sorry that has come into the site from an indexed link that I either changed, deleted or it became outdated content.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4375906 posted 12:11 am on Dec 13, 2011 (gmt 0)

I use the same physical page for 404 and 410. In general, humans don't care whether the page used to exist or not. They only care that it isn't there now. So they need the same information either way.

You do not need to think about a 410 page unless you're explicitly returning 410s, because this response never happens automatically. But the Apache-default 410 is even scarier than the default 404, so don't let your humans land there!

MikeNoLastName

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 2:59 am on Dec 13, 2011 (gmt 0)

>Google warns about mass redirecting old pages to the root home page. It's rarely a good idea. <

REALLY? I must have missed that one or misunderstood it. Now they tell me :). If you find a link to the actual G mention I would love to read it in context.
It's really only TWO pages I'm moving, but very key ones.
I'm thinking of doing as you suggest, or at least something close. The problem is the old pages are the old home page (index.htm) and the original primary page abc.htm on old domain www.domaina.com. The rest of the old domain are staying for now. I am 301 redirecting them to www.domainb.com the new home page AND primary page.
(umm, I just thought of something really stupid, should I be 301ing them to "http://www.domainb.com" or "http://www.domainb.com/index.htm"? or doesn't it matter?)
We have a ton of old, very good links to both of the original pages sending a lot of good PR. Possibly more than to the new one. I'm thinking of, at least temporarily, restoring the old www.domaina.com index.htm page, and 301 redircting the old abc.htm to it until at least abc.htm goes away permanently and domainb.com returns to some semblance of normal. Then either I can try 301ing the domaina/index.htm to domainb.com root or other page on domainb again to see if it "takes" this time (there is no corresponding new page other than the root, but I Could pick another if that is better). Granted that will create a chain which G also warns against. Or I can keep domaina.com/index.htm as a "gateway" page which simply says "our new home page is:" and a single clickable outlink to domainb.com. Again not advised, but... how else can you keep most of the old PR when moving just PART of the domain to a new root? Any other suggestions that keep PR flowing?
What if you meta NOINDEX, FOLLOW the "gateway" page (in this case the restored domaina/index.htm), will it pass on most of it's PR and not be viewed as a gateway page? Or does it need to be indexed to pass on PR?
Thanks Much everyone for any advice! I'm really hesitant to make any move because a lot is at stake and I don't want to cause a an even longer-term problem.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4375906 posted 4:26 am on Dec 13, 2011 (gmt 0)

>Google warns about mass redirecting old pages to the root home page. It's rarely a good idea. <

REALLY? I must have missed that one or misunderstood it. Now they tell me :). If you find a link to the actual G mention I would love to read it in context.


There is a whole recent thread about it here [webmasterworld.com...]

should I be 301ing them to "http://www.domainb.com" or "http://www.domainb.com/index.htm"? or doesn't it matter?)

To "http://www.domainb.com/" (note the final slash). It matters. If your domainb index.htm page redirects to "http://www.domainb.com/" (as it should) then if you redirect domaina page to index.htm on domainb, you will have a chain redirect which is not good.

With regards to the rest of your questions on redirecting two pages from the old domain to the new, and if they are just two pages and if the home page is really the best target for both pages, then I would redirect them both to the domainb home page (providing the domainb home page is relevant to abc.htm and to index.htm).

Alternatively, if domainb home page is not relevant to abc.htm, you could create another new page on domainb to which you could redirect abc.htm and also redirect domaina index page to domainb home page (again, if it is relevant, i.e. the best replacement).

I would certainly NOT go through "temporarily restoring" domaina index page, redirecting the other and waiting for abc.htm to "go away" as it will not go away. Google will be periodically requesting abc.htm and the moment you remove redirect, you lose PR juice.

You must not think of PR juice as something you "transfer" and then transfer is done. You must think of it as a "flow" - the moment you cut the pipe, it stops flowing.

Just redirect these two pages from the domaina to the best page on domainb and leave redirect in place for years to come.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 8:25 am on Dec 13, 2011 (gmt 0)

Good answer. Yes. Canonical URL for root home page is www.example.com/ with trailing slash. Redirect requests for named index files to that.

MikeNoLastName

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 10:04 am on Dec 13, 2011 (gmt 0)

Hi aakk9999,

>If your domainb index.htm page redirects to "http://www.domainb.com/" (as it should<

No, it has gone to www.domainb.com for years now, since G started this whole canonical bug fix thing (REALLY, content developers shouldn't HAVE to worry about this stuff), not www.domainb.com/, just like EVERY other authoritative site from google and amazon to ebay and cnn. Years ago G said to "PICK ONE" of your choice and stick with it. Not specifying one or the other. We surveyed the others and went that way and have stuck with it. Odd how they all resolve to ---.com when you type in ---.com/ but Google ALWAYS lists them n search as ---.com/. Who are the WRONG ones here?

>"as it should"?<
what is your rationale for this?

>To "http://www.domainb.com/" (note the final slash). It matters. <

That's what we had for the last year and a half. Obviously no longer works. I changed it (as a test) to no-slash a couple days ago. Giving it time to settle before trying something else.

>waiting for abc.htm to "go away" as it will not go away. Google will be periodically requesting abc.htm and the moment you remove redirect, you lose PR juice. <

Sorry, I stated that poorly. You are right there. By "go away" I did not mean to ever remove the 301 redirect, but rather that abc.htm would hopefully be removed from the index (where it is now as a copy of domainb's home page) because G FINALLY realized it had been 301 redirected to domaina/index.htm (rather than the root of domainb which it is currently indexed as, as a copy of it) and domainb would be free of domaina's negative dupe penalty influence hopefully.

>Just redirect these two pages from the domaina to the best page on domainb and leave redirect in place for years to come. <

That's precisely the problem unfortunately. Did that. Both have been redirected to the home page on domainb, as I said, for years. It worked fine for a long while, but now they are back in the index, and we've changed nothing there in over a year (until Panda hit and in the last week we noticed them in the index as "phantoms") Phantoms don't show up in site:example.com, but if you search for "domaina.com", "www.domaina.com" or "domaina.com/abc.htm", they show up with an excerpt from the domainb.com home page but with "domaina.com", "www.domaina.com" or "domaina.com/abc.htm" as their URL. And the linked to page exhibits the typical symptoms of a duplication penalty. Other normal pages 301 redirected do NOT do this. At worst, if you search on the URL of normal 301 redirected pages, they show up entirely as the correctly canonicalized URL of the redirected-to page.

Read the discussion on the linked thread and it sounds like no firm decision one way or the other was agreed upon there.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 12:13 pm on Dec 13, 2011 (gmt 0)

www.example.com is the hostname with no page specified.

The root page can be expressed as
/ or /index.html (or /index.php etc) and / (the shortest) is best.

Put it all together and you get
www.example.com/

The fact that browsers will do
"GET / HTTP/1.1" when users click a link to href="http://www.example.com" is the browser being helpful.

You can shorten that step by linking to
href="http://www.example.com/" instead.

Putting it another way; from a deep page within your site, you would link to
href="/" to link back to the root. Linking to href="" would not work.



Do redirect named index file requests to strip the name:

# Redirect index requests
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(php|html?)
RewriteRule ^(([^/]+/)*)index\.(php|html?) http://www.example.com/$1 [R=301,L]


This index redirect code goes BEFORE any www/non-www canonicalisation redirect code.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4375906 posted 4:33 pm on Dec 13, 2011 (gmt 0)

Read the discussion on the linked thread and it sounds like no firm decision one way or the other was agreed upon there.

Hmm, I thought it is pretty clear that mass redirecting to home page is a bad idea. But this is subject of that other thread.

Phantoms don't show up in site:example.com, but if you search for "domaina.com", "www.domaina.com" or "domaina.com/abc.htm", they show up with an excerpt from the domainb.com home page but with "domaina.com", "www.domaina.com" or "domaina.com/abc.htm" as their URL.


I have seen cases like this, but usually only in the few weeks immediately following the introduction of redirect. Are you sure that your redirect was all the time in place and techincally sound and that Google does not occassionaly get 200 OK result?

In any case, I do not think that because of 1 dulicate page you would get "penalty". But if your redirect is not always technically sound, I *could* see domainb ranking being impacted if the PR stops flowing (you said you have some strong links on domaina page you are redirecting).

What I would do is:
- check my logs from domaina to see if there is 301 returned every time when the page is requested.
- as a failsafe method, I would add canonical link element to page you are redirecting, which should point to home page of domainb. If your redirect works, canonical link element will not be able to be seen. But if you have occassional technical problems with redirect, then at least there is a fallback in G. seeing canonical link element the page pointing to the *same* page you were/are redirecting.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 5:57 pm on Dec 13, 2011 (gmt 0)

If you find a link to the actual G mention I would love to read it in context.

There's way too much FUD coming from the webmaster community (and NOT from Google) about 404 and 301 status. It's really straightforward and the issue IS decided.

Here's both a quote and a link to a Google blog post from Susan Moskwa that leaves no doubt whatsoever. Note that the article is labeled "Webmaster level: Beginner/Intermediate". It's not rocket science, in my mind it's simply telling the honest truth about the requested URL.

A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn't exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content. Keep in mind that just because a page says "404 Not Found," doesn't mean it's actually returning a 404 HTTP response code—use the Fetch as Googlebot feature in Webmaster Tools to double-check.

Do 404s hurt my site? [googlewebmastercentral.blogspot.com]


The whole article is worth a full read.

Now I appreciate that you are only redirecting some URLs to the home page, rather than ALL unknown URLs. But ask yourself if the home page offers a real and honest substitute for the removed content. Odds are it doesn't, and if it doesn't then don't redirect.

However, the old URLs will eventually fall out of Google's public results. 301s tend to get processed slowly - partly because there are a lot of trust issues in the area of redirects.

GodLikeLotus

10+ Year Member



 
Msg#: 4375906 posted 6:02 pm on Dec 13, 2011 (gmt 0)

How do we redirect everything going to domain.com/index.htm to domain.com/ ?

Have a redirect from non-www to www. shown below

RewriteEngine On
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

Tryed this but it does seem to work

redirect 301 /index.htm http://www.domain.com/

Please can someone help as our sitemap.xml is showing both domain.com and domain.com/htm, 2 homepages

[edited by: Robert_Charlton at 11:02 pm (utc) on Dec 18, 2011]
[edit reason] delinked example domains [/edit]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 6:05 pm on Dec 13, 2011 (gmt 0)

I already posted the right code several posts back. #:4397378

GodLikeLotus

10+ Year Member



 
Msg#: 4375906 posted 6:53 pm on Dec 13, 2011 (gmt 0)

g1smd - Thank You, it works, however I have an xml sitemap generator which would not allow me login. The url and folder did end with index.php so I removed the 2 references to php from your redirect code, leaving on the html part and now the xml sitemap generator works.

The redirect is working however, the sitemap is still displaying domain.com and domain.com/index.htm even though we have just created new xml and html sitemaps.

I do have links on the pages to index.htm is this the problem?

GodLikeLotus

10+ Year Member



 
Msg#: 4375906 posted 7:00 pm on Dec 13, 2011 (gmt 0)

Working properly now. Thanks again g1smd

Now if we can only get Google to stop ranking us behind Bing Feeds for our own work, then things might just get better.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 7:21 pm on Dec 13, 2011 (gmt 0)

I do have links on the pages to index.htm is this the problem?

You must link to the URLs that you want users to see and use. URLs are defined by links.

enigma1

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4375906 posted 2:05 pm on Dec 14, 2011 (gmt 0)

We have the same problem, except they were redirected over a year ago, went away, and now they've been back for months (with the content of the home page, just like the other poster). And we're getting a serious duplication penalty from it because they are pointing to our home page.

If you still see them indexed with your domain, it means you're not doing a 301 redirect. Or there are cases where the server response doesn't return the proper headers. Many of these problems are because of incorrect use of server or application scripts.

It's a classic mistake to do hard-coded redirects with the server scripts and not cover all cases. For example consider this:
redirect 301 /somepage.php http://www.example.com/some_page.php
It may look fine, if you enter exactly somepage.php as the query. The 301 redirect header will show as
http://www.example.com/some_page.php

Now what if we modify the request a bit.
somepage.php?something
the 301 destination sent by your server may show as:
http://www.example.com/some_page.php?something
That's originated from your domain and it shouldn't. There are many other pitfalls.

Hmm, I thought it is pretty clear that mass redirecting to home page is a bad idea

If you do not know where and how to redirect best thing is do not redirect and leave the default server responses.

A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn't exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code.

Is that right? -leaves no doubt whatsoever- so here is an example in this forum how to recreate it.
[webmasterworld.com...]
It returns 200 headers with some message profile not available. That's a soft 404. Will it hurt the site? Well that depends. If other websites start exposing them, in gwt you will start seeing soft 404s because the content looks like 404 and spiders do not have parts of the content in their index and the page looks like a template. Also the fact an invalid value is populated and replicated with the template is worrisome.

MikeNoLastName

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 10:54 pm on Dec 14, 2011 (gmt 0)

g1smd,
I was thinking about this all last night and think I figured it out just before coming in here and you hit the nail on the head in your statement as far as what I was just thinking too! This is only my theory so far, but I think it is right.

The 301'd page(s) on domaina which suddenly started showing up in the index again are the most powerfully backlinked between the two domains. without them the homepage of domainb is relatively unlinked (to make matters worse there were still a lot of internal pages linking to domaina/abc.htm which had not been changed yet - not good, but not a big priority when things were working right).

The home page on domainb is the primary menu and redistributes PR to the top level category pages.
When domaina/abc.htm stopped redirecting to domainb, it gave a simiilar appearance of a duplication penalty on domainb, because there was no longer any obvious primary page. All the PR was strictly from outside to random internal pages and that is how they showed up in the site:domainb search, random!

Ok, I see your point, so I changed the redirects back to "www.domainb.com/" from
"www.domainb.com" that I changed it to yesterday

>You must link to the URLs that you want users to see and use.<

But: if we are redirecting to www.domainb.com/ (rather than www.domainb.com and our long-term adopted standard canonical is www.domainb.com (not www.domainb.com/), is it still correct for consistency to link to "home" from internal pages using "http://www.domainb.com" and to use <link rel="canonical" href="http://www.domainb.com/"> in the domainb.com/index.htm file as recommended in the rel canonical video?

I'm not real clear on ReWrite code syntax. In your code:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(php|html?)
RewriteRule ^(([^/]+/)*)index\.(php|html?) http://www.example.com/$1 [R=301,L]

which I assume goes in .htaccess on domaina, is "example.com" supposed to be domaina.com or domainb.com?
If it is domainb.com, then this is supposed to do the actual redirection of the home page all by itself?
OTOH If it is supposed to be domaina.com, then what do we 301 redirect to domainb later in the .htaccess, since there will be no implied index.html left to redirect?
currently I am doing simply:

ReWriteCond %(HTTP_HOST) !www.domaina.com
ReWriteRule (.*) http://www.domaina.com/$1 [R=301,L]

"Redirect 301 /index.html http://www.domainb.com/"
"Redirect 301 /index.htm http://www.domainb.com/"
"Redirect 301 /abc.htm http://www.domainb.com/"
(and a bunch other 301s)

If I were to "Redirect 301 / http://www.domainb.com/" that redirects EVERYTHING, no?

Perhaps this is why domaina.com and domaina.com/ are getting indexed, even though everything redirects browsers?

BTW, I found the same code as you posted above, elsewhere, written the same but with *'s in place of the +'s are you sure this is correct?

Why can't search engines be more like browsers? This all works fine as it is with browsers. :)

aakk9999:
No, the last prior change date on the .htaccess was a little over a year ago. Everything seemed normal until sometime around September when we started noticing it. Perhaps it has been a problem for longer than that and we just didn't notice it pre-panda. Also no 200s on "GET /". The bot comes about 1-2 times per day. Although the bot does ring twice half the time and gets two different byte sizes (305 & 312) which I presume is it getting domaina.com and then being redirected to www.domaina.com.

tedster;
Thank you for the reference. Read it all. I agree there is way to much conjecture by the community on things and argument over the interpretation, I take it all with a grain of salt unless I hear it straight from the horses mouth. My misinterpretation of the term MASS when g1smd first posted it apparently does not apply to only 2 or 3 appropriate old pages.

>But ask yourself if the home page offers a real and honest substitute for the removed content. Odds are it doesn't, and if it doesn't then don't redirect. <

In our case the home page on the new domain IS in fact the exact same content as the old home page and the other page we're trying to redirect, combined (plus a bit more). The site has been physically spread across three domains for over 15 years (for low-tech, cheapo server load balancing reasons since our naive origins - content producers shouldn't NEED to be SEOs, Webmasters AND programmers just to get their word out). We're simply trying to combine them all onto one domain now. As we go along we copy the old page over, often rename it more appropriately, and 301 redir it to it's new location. Then change all the internal links we can find. No mass 404 redir involved. We DO have a 404.htm page, that comes up on 404s as designated in the .htaccess, and it simply has a clickable link to our home page if they want to go there.

On the other hand, I could see the logic where one might WANT to 301 an old page which has been removed rather than 404 since a 404 page/return code, I'm guessing, doesn't pass page rank from possibly tons of former backlinks (perhaps it should?). In our case we pretty much don't delete ANYTHING. It's all clearly dated, so if someone gets to it they can consider the origination date and decide if it is still applicable to their own situation.

Thanks ALL for all your patience and assistance!

[edited by: Robert_Charlton at 10:51 pm (utc) on Dec 18, 2011]
[edit reason] delinked examples and fixed formatting [/edit]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4375906 posted 12:33 am on Dec 15, 2011 (gmt 0)

I'm not real clear on ReWrite code syntax. In your code:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(php|html?)
RewriteRule ^(([^/]+/)*)index\.(php|html?) http://www.example.com/$1 [R=301,L]

which I assume goes in .htaccess on domaina, is "example.com" supposed to be domaina.com or domainb.com?

If it is in "target" position, then it represents whatever domain you are directing to. Look at your own post and you will see why "domaina" and "domainb" can't be used. If you need to distinguish, you can call them "example.com" and "example.net". The object is simply to keep from auto-linking so your examples don't become unreadable. So from here on, translate as necessary.

(Mod note: I've cleaned up the "domaina" "domainb" auto-links in the post above. Lucy is correct, though, and it's best to use example.com and example.net as she suggests.)

Here, the domain name isn't relevant because that's not what the code is doing. It's about eliminating the duplicate content between "blahblah/" and "blahblah/index.html".

THE_REQUEST means that you're looking at the original request that came in from "outside". Careful! The word "outside" includes links from within the site itself. Even if the new page is right next door in the same directory, the request has to go all the way out of the server and wait on the virtual doorstep while Apache does its stuff. THE_REQUEST also includes any redirects, but excludes rewrites and also site-internal business such as auto-indexing.

Here, we are redirecting all outside requests for "blahblah/index.html" (et cetera) to plain "blahblah/". It has to be constrained to outside requests, because your Apache installation itself will translate all directory requests (anything ending in /) to include that directory's actual index file, whatever it may be called, or possibly an auto-index. But the user won't see this filename, just the "blahblah/" part.

If it is domainb.com, then this is supposed to do the actual redirection of the home page all by itself?

Not only the top-level home page. All index pages of all directories. That's what the ([^/]+/)* means: the request might include one or more subdirectories. The form (php|html?) is only necessary if you use all three extensions for index pages on your site; in your actual htaccess, use only the extensions that actually occur. But, again, I think you've misunderstood the purpose of this bit of code.

currently I am doing simply:

ReWriteCond %(HTTP_HOST) !www.example.com
ReWriteRule (.*) http://www.example.com/$1 [R=301,L]

You need to be more exact:

!^(www\.example\.com)?$

This rule does double duty, as it also deals with non-www redirects. Or with-www if you want to turn it around and use the non-www version by preference. It means "If the requested domain is not exactly www.example.com or exactly nothing", where the "nothing" part covers HTTP/1.0 requests, then send the user over to www.example.com.

Redirect 301 /index.html http://www.example.com/
Redirect 301 /index.htm http://www.example.com/
Redirect 301 /abc.htm http://www.example.com/
(and a bunch other 301s)

If I were to "Redirect 301 / http://www.example.com/ that redirects EVERYTHING, no?

Yes, but note that you can only do this if your two domains live on different servers so the htaccess or config file will only see one of them. Otherwise, it would also redirect the requests that already say www.example.com so you'd get stuck in infinite redirects. That's one reason you need mod_rewrite rather than mod_alias (Redirect by that name).

Perhaps this is why example.com and example.com/ are getting indexed, even though everything redirects browsers?

That can't be right. The slash at the end of the domain name is added by the browser itself before the request ever goes out into the world. The variable is example.com/ versus example.com/index.html

BTW, I found the same code as you posted above, elsewhere, written the same but with *'s in place of the +'s are you sure this is correct?

* and + have different meanings. The asterisk means "zero or more". The plus means "one or more". In the case of directories, it's the difference between including the top-level directory (asterisk) and including only deeper directories (plus).

[edited by: Robert_Charlton at 10:56 pm (utc) on Dec 18, 2011]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4375906 posted 12:47 am on Dec 15, 2011 (gmt 0)

Redirecting a few old URLs to a new page where content is a good match is a good idea.

Redirecting to the home page and mass redirects are not a good idea.

MikeNoLastName

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 9:56 pm on Dec 15, 2011 (gmt 0)

Okay, all this sounds way more confusing than I think it needs to be and this thread has gone from practice to philosophy and back again. I think you all are very generously trying to solve a problem I DON'T feel I have and missing the one I DO have. As far as I can tell with the code I have now, and apparently what my server is already doing based on configuration, ALL browser requests to example.com, example.com/, www.example.com/ example.com/index.htm, etc are correctly resolving to www.example.com So I don't see any need to change anything to fix that.
The problem I AM having (and the original topic of this thread) is with Google not removing 301 redirected pages from the index. Originally I thought it was, but, from what I can see now it is not necessarily JUST the redirected home page. Maybe this is not unusual. Perhaps I don't have a problem at all. So if others can try it out and let me know, it will tell me if I'm off on a wild goose chase or not.
Simply pick a URL which you have 301 redirected and type the old url name into the google search field. Say you have example.com/abc.htm which has been 301 redirected to example.net/xyz.htm. Type "example.com/abc.htm" into the Google search field. As indicated below it seems to be more prevalent with the OLDEST redirections and sites which have been unexpectedly pandalized.
Do you get back in the search results a listing which looks like the following rather than the expected NOTHING?:

Title of Page Xyz
www.example.com/abc.htm
Description snippet from page Xyz

Of course once you click on it, it goes to example.net/xyz.htm because it has been redirected. Now type "example.net/xyz.htm" into the Google search field.
and get the expected:

Title of Page Xyz
www.example.net/xyz.htm
Description snippet from page Xyz

Are these being conceived by G as duplication errors just because once upon a time I did a 301 redirect? They also seem to appear to NOT be passing PR on to the destination page.

It seems to be MOST predominant with redirects which have been there for more than a couple months, but NOT all of them. I have yet to come up with a pattern. All the more recent (in the last month) ones I added to the same .htaccess files appear to ALL be working correctly (or have not yet been picked up, but NEVER malfunctioning).

It also SEEMS to be more predominant with urls which have been 301 redirected to ANOTHER domain.

In my case both example.com and example.net are both on the same physical Apache server, but use different IP addresses. Other 301'd URLS in the same .htaccess that go to the SAME domain seem to work as they should and not show up in the index except as mentioned above with older urls.

If others are seeing this, have you been heavily Panda-lized for no other apparent reason? Especially on these particular pages like we have?

enigma1

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4375906 posted 10:46 am on Dec 16, 2011 (gmt 0)

If you have a page example.com/abc.htm which outputs a 301 redirect header then it's not normal to see the page listed when you search in google. Assuming the bot is aware of the 301 you should get nothing about these pages.

Otherwise the fact the page remains indexed for long time implies the bot may found a loophole with the way you did the redirect.

So now you said if you access it like:
www.example.com/abc.htm
it will redirect
If you access it like:
example.com/abc.htm
www.example.com/abc.htm?p=1
...etc are all these combinations redirect with a 301 header to the exact page you want? Do they redirect to:
www.example.net/xyz.htm

You should use something to check the response headers don't only rely what the browser does.

And I have not seen inconsistencies with the google index once the bot fetches the 301 redirects. When you search google for links with the old site:
site:www.example.com
how many pages do you see? Also have you setup the old site on gwt and checked for errors? Are there any? And you don't block the bot on the old site via robots.txt right?

Marketing Guy

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4375906 posted 11:36 am on Dec 16, 2011 (gmt 0)

Google warns about mass redirecting old pages to the root home page. It's rarely a good idea.


Interestingly, I found an example yesterday of a fairly large site (300k pages) 301ing all of it's content to another large site (3 million pages) - all redirects go a single top level page.

They're big brands owned by the same parent company. I spoke to a mate who works for the company and they're just consolidating their web properties as part of an internal reshuffle so fair play on that part.

The rankings of the smaller site (now redundant) weren't that great, but as a result of the consolidated link weight, the larger site is ranking across the board for loads of terms. I'm monitoring 5 groups of 200-300 keywords for a client in the market and the large site has basically grabbed top 3 positions for 90% of the keywords overnight.

It's quite a big scale consolidation - will be interesting to see how it pans out. Personally, I can't see their rankings staying like that (they're essentially all search result pages - similar to a Meta job board targeting every job title / location variation with results pages - different industry though, that's just an example).

Still early days - Google hasn't removed most of the old site's URLs yet. Been running daily reports to see what happens with it.

This 48 message thread spans 2 pages: 48 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved