homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 246 message thread spans 9 pages: < < 246 ( 1 2 3 4 5 [6] 7 8 9 > >     
Why Does Google Treat "www" & "no-www" As Different?
Canonical Question
Simsi




msg:3094365
 7:58 pm on Sep 23, 2006 (gmt 0)

...why does Google want to treat the "www" and non-"www" versions of a website as different sites? Isn't it pretty obvious that they are one site?

Or am I missing something?

 

hutcheson




msg:3098330
 10:58 pm on Sep 26, 2006 (gmt 0)

>This debate will get hotter unless something that is wrong is recified by google.

That's OK. The forum can stand it.

But what you call a "wrong" happens to be the W3C definition, and I very much doubt if any W3C member cares how hot a debate forum gets. And after all, W3C tends to assume universal technical ability. (Which is valid from their perspective -- anyone they deal with, can either acquire it or rent it.)

And, when the silicon chips are down, Google is going to follow the W3C.

AlgorithmGuy




msg:3098347
 11:18 pm on Sep 26, 2006 (gmt 0)

That's OK. The forum can stand it.

But what you call a "wrong" happens to be the W3C definition, and I very much doubt if any W3C member cares how hot a debate forum gets. And after all, W3C tends to assume universal technical ability. (Which is valid from their perspective -- anyone they deal with, can either acquire it or rent it.)

And, when the silicon chips are down, Google is going to follow the W3C.

Us webmasters are resourceful. We created you, and we created google, And we will mold WC3 if we have to.

Without us, you cannot exist. The elevated podium where you stand on, pontificating DMOZ rhetoric, will be removed from under you.

Webmasters are creative geniuses. The quintessential purveyors of good content and vendors of originality. We ooze fresh ideas. We populate the web. We give you an excuse to exist.

DMOZ and some of its editors are a bit stale donít you think?

I know DMOZ is undergoing a drastic radical change. Metamorphoses is not just a living creature thing. I wonder what horrors will beset us once the appallingly inadequate change is complete.

God have mercy on our souls. ;)
.
.

[edited by: AlgorithmGuy at 11:39 pm (utc) on Sep. 26, 2006]

g1smd




msg:3098424
 12:59 am on Sep 27, 2006 (gmt 0)

>> having www.example.com and example.com hand out the same information isn't intentionally cheating, just choose one to list and forget the other as long as it remains the same content. <<

Ah, but take this forum for example.

In the time between Google looking at www.webmasterworld.com/google/3094363.htm and then coming back for webmasterworld.com/google/3094363.htm several posts may have been made - and so the pages are NOT the same when they are compared.

And, they are not suddenly going to change things so that after they request a www URL that they immediately request the other URL without the www, because that would double the amount of work they need to do, and double the amount of bandwidth used on your site: and Google already eats enough of that as it is when you look at large sites.

[edited by: g1smd at 1:14 am (utc) on Sep. 27, 2006]

g1smd




msg:3098427
 1:05 am on Sep 27, 2006 (gmt 0)

>> www.mysite.com/cat_widget.cfm and if I change it to www.mysite.com/cat_widget.html <<

You can call your files whatever you like, and so even after they become static pages, no longer delivered by a ColdFusion script, you can still use the old names ending in .cfm.

The server will not care. Browsers will not care. Bots will not care. As long as the content is served with a MIME type of "text/html" they will still be happy. You can set that up with a one line instruction in the server configuration file, or in the .htaccess file for the site.

I am aware of a site where all the filenames still end in .asp - as they have for several years now - but the whole site runs as a PHP script on an Apache webserver.

"Cool URIs never change"

WolfLover




msg:3098644
 6:12 am on Sep 27, 2006 (gmt 0)

You can call your files whatever you like, and so even after they become static pages, no longer delivered by a ColdFusion script, you can still use the old names ending in .cfm.

g1smd, thank you, I was hoping that was possible but I was not sure, so thank you very much for the information.

I have had a fellow webmaster take a quick look at my site, which I am very thankful for. This webmaster advised that I do have the canonical issues. I asked my host today to do the 301 redirect for me as I do not have access to do that unfortunately. This is why I wanted to build my own site and had hoped to be able to do so without losing what serps I still have left.

Now, if I can ask this one question again and if I've missed the answer somewhere, please accept my apologies.

Let's say that my main issue is the canonical issue and it will be fixed shortly. Does anyone believe that this could be the only reason I took a hit of losing about 60% of my traffic as of September 15?

Does anyone here have the experience of having many pages go supplemental, then fixing the canonical issue, and your pages come back? The part I'm not understanding here is if a page goes supplemental and everyone is saying that it will be a long time before any bots even look at it again, how would I ever get my pages out of the supplemental mess?

BigDave




msg:3098673
 6:39 am on Sep 27, 2006 (gmt 0)

Us webmasters are resourceful. We created you, and we created google, And we will mold WC3 if we have to.

No, webmasters did not create DMOZ, they corrupted it. DMOZ was envisioned by the architects of the web, and staffed by users. It was not intended for webmasters to be the ones volunteering, and a lot of those that volunteer have nothing more than a personal site.

And no again on webmasters creating google. Google's popularity had little to do with webmaster recommendations, other than the linux search appearing on slashdot. The majority of Google's growth can be attributed to AV screwing up, and person to person communication.

As for the W3C, how many standards comittees have you sat on? Do you even know how to get on them, and then how to get them to listen to you?

Even if you get on the committee, the W3C doesn't even control the spec. The spec is based on a 20 year old RFC, so you have the challenge of trying to "fix" something that is ingrained in all the software for the last 20 years, and isn't broken.

What you should do is write an RFC defining the standard for setting up servers, because that is where the breakdown is.

Simsi




msg:3098694
 7:32 am on Sep 27, 2006 (gmt 0)

In the time between Google looking at www.webmasterworld.com/google/3094363.htm and then coming back for webmasterworld.com/google/3094363.htm several posts may have been made - and so the pages are NOT the same when they are compared.

I can see why that might give Google a problem determining if the two are serving different content or not in the short term, though once the thread ultimately rests this will be rectified on a future crawl. Plus I'd be highly surprised if Google's algo cannot recognise pages containing user comments and allow for this anyway.

But the example is surely another reason why they should assume the two variations are one in the same by default. Because that's all that's wrong IMO - the default setting has flipped over on it's head which is potentially screwing up any webmaster/author/hobbyist who doesn't use WebmasterWorld :)

I say "potentially", because I'm not sure we know that the duplicate content penalty that canonicalisation incurs is in fact of any major consequence.

Regarding this being a server issue, I disagree. The server was here long before Google introduced us to canonical problems. And as far as I'm aware, Yahoo and MSN don't do this - do they? That said, I believe Google (quite rightly) is it's own boss and we should not expect a free pass. It's their call IMO. But they could still make life easier for the average Joe and that's what this thread (was (supposed to be) about.

[edited by: Simsi at 7:50 am (utc) on Sep. 27, 2006]

TerrCan123




msg:3098723
 8:45 am on Sep 27, 2006 (gmt 0)

Wolflover you can tell google which version of your site to have set as the default, www or non-www. It is one of the webmaster tools along with sitemaps they have.

AlgorithmGuy




msg:3098747
 9:20 am on Sep 27, 2006 (gmt 0)

No, webmasters did not create DMOZ, they corrupted it. DMOZ was envisioned by the architects of the web, and staffed by users. It was not intended for webmasters to be the ones volunteering, and a lot of those that volunteer have nothing more than a personal site.

And no again on webmasters creating google. Google's popularity had little to do with webmaster recommendations, other than the linux search appearing on slashdot. The majority of Google's growth can be attributed to AV screwing up, and person to person communication.

BigDave,

Some times, just sometimes, being too pedantic, sticking rigidly to the rule book, is not the way to get the overall picture through a tunnel vision. There are fields to the left and an ocean to the right maybe. A broader vision is better.

What I mentioned stands. Let the DMOZ editor put me right.

Regarding this being a server issue, I disagree. The server was here long before Google introduced us to canonical problems. And as far as I'm aware, Yahoo and MSN don't do this - do they? That said, I believe Google (quite rightly) is it's own boss and we should not expect a free pass. It's their call IMO. But they could still make life easier for the average Joe and that's what this thread (was (supposed to be) about.

Simsi,

Yes the servers were around but unfortunately a server is just a software. Don't look at servers as an icon of something looking like an imposing computer box. Server software writers invariably have no or little understanding of search technology or what they are creating according to what registrars are selling to customers.

A software writer simply gets it to deliver requests etc. And it really is a case of the servers not complying to change. In fact there are still very bad server software out there. A product is only as good as its writer. I've tried at least 3 and the apache is the best by far. To make a point, the apache is many fold better.

Abilities of these servers varied very widely. A search engine simply cannot cater for a server.

Even "microsoft's" server software was the ultimate crap server at one point.

Your home computer can be made into a server containing websites on a shared IP within 15 minutes. It is possible within 5 minutes for a nimble fingered operator.

Overall, I agree that google is responsible for the information it gathers. The way it is doing its job is to gather what you have and then says like it or lump it.

And yes, within 5 minutes your computer can be made into a server. Another 5 minutes to sort out a few canonical issues, 5 more to secure things up and you have a good server.

One of the things that surprises me is sucessful websites on cheap hosting packages. Nothing but potential harm can come about with such hosting.
.
.

[edited by: AlgorithmGuy at 9:30 am (utc) on Sep. 27, 2006]

AlgorithmGuy




msg:3098777
 9:55 am on Sep 27, 2006 (gmt 0)

Some forward thinking is good and a website saver.

1.
When purchasing a domain, make sure the registar provides ways for you to control your doman through as many features as possible in particular the ANAME RECORDS. Some very good providers allow the domain to resolve to wherever you want. And they are efficient with it.

2.
As a precautionary measure set up your home computer to be a server at will. If you are on a static, dedicated, IP you won't need a DNS update. If you have a dynamic IP many good DNS update providers have efficient services to assist.

If at any stage you suspect the host of your website is messing up. Or they are useless in canonical issues and they have lots of downtime etc etc. By simply resolving your ANAME records to find your home computer can save your website.

If your website uses secure pages, complex scripting, database etc then a suitable alternative host is your only option for the change.

Hosting companies are notorious in serving websites with little or no search technology behind the service.

Indeed, jumping from a bad host to another is not a good idea until some research has been done on the new host. Until you find a suitable host, your home computer can serve your web pages.

No 301 or any redirect need be done from the bad host since it was severed at source at your ANAME RECORDS.

3.
Problems to look out for in hosts are an enormous task. Look at headers for all variations of error links that are possible. .DOT before the slash, missing / slash, etc like I mentioned in previous posts. Install a downtime recorder on your website. Very useful tool since it can tell you if your host is cheating you by having too many websites on one IP.

You cannot afford not to know these things. All could be well one minute and the next day, a new website sharing your IP catapults in popularity on the net straining the IP and access to crawlers. The website may have become a super site with thousands of hits a day and the site has downloads and uploads and chat rooms etc etc. Your website will suffer. Crawlers may not get a look in sometimes. You will be at a disadvantage against a competitor on a cleaner server.

What logs don't show is more important than what they do.

Your host may simply saturate the IP by the sheer numbers of websites.

A questionable website may be listed on the same server without you knowing and google demotes all sites on that IP.

The hosts only interest is to make money.

I personally moved 3 websites because logs showed that downtime was too frequent. The last entry was googlebot before the downtime. That meant the host was causing the site problems too often and the site was not being crawled naturally.

God have mercy on a website if hosted with a host that itself has canonical issues. Too many cowboy hosting companies exist with outdated equipment and software and lack totally in search technology. It is indeed often the case of a tanked website.

.

[edited by: AlgorithmGuy at 10:25 am (utc) on Sep. 27, 2006]

theBear




msg:3099195
 4:02 pm on Sep 27, 2006 (gmt 0)

Simsi,

"I say "potentially", because I'm not sure we know that the duplicate content penalty that canonicalisation incurs is in fact of any major consequence."

You are correct, there is disagreement over the actual impact, ISTR an old post about duplicate content problems being a number of times they occur and that determing how long they affect a page. Things have probably changed since then. However it is too much of a co-incidence that when you find a site having ranking problems you also find several to many examples of duplicated content.

Back not all that long ago a certain Matt Cutts had some what of a duplicate content issue dealing with Bacon something. ISTR an incident with Google's Adsense home page as well.

BTW, I understand your original question. I just don't see an answer that would universally work.

AlgorithmGuy,

Until they invent a computer that can execute an infinate loop in a finite amount of time, automatic canonicalization of the web is not likely a possibility. Even if you discount the fact that it would probably result in a lot of SERPS pointing to pages that only exist in Google cache.

BigDave,

I've never tried to move a standards body, however in a past existance I have moved the IRS ;). I like you also expect that they will not see a problem, because from a "standards" view there isn't one

Flexibilty does have its downside at times.

I also wonder if the hosting folks would read a RFC directed at them. The first question they would have is. What's an RFC?.

Please make note of the weasel words.

Cheers,
theBear

[edited by: theBear at 4:09 pm (utc) on Sep. 27, 2006]

Simsi




msg:3099200
 4:05 pm on Sep 27, 2006 (gmt 0)

I've never tried to move a standards body, however in a past existance I have moved the IRS

Intriguing ;)

BigDave




msg:3099245
 4:41 pm on Sep 27, 2006 (gmt 0)

The server was here long before Google introduced us to canonical problems.

Not really. Google may not have used that term, but there have been cannonical issues all along. Do a search on "non-www" on webmasterworld. Then go to the oldest thread in the google archives from 2002. In that thread, you will find a reference to even older threads that are no longer in the archives.

There are articles on the web that people wrote in the 90s about not linking to /index.html and / but to only link to / .

Google only called it a cannonical issue publically recently. And it only started causing problems when google started trying to weed out the massive amounts of duplicate content.

Yeah, Google could probably handle it better, but defending bad server setups is not going to get you change.

<added>By the way, in the old days, your DNS records did not automatically send www and non-www to the same IP address, and servers were not set up to automatically serve both.</added>

[edited by: BigDave at 4:44 pm (utc) on Sep. 27, 2006]

theBear




msg:3099275
 5:06 pm on Sep 27, 2006 (gmt 0)

"<added>By the way, in the old days, your DNS records did not automatically send www and non-www to the same IP address, and servers were not set up to automatically serve both.</added>"

I'm afraid that some of the default server setup templates have even added to the mess :( .

g1smd




msg:3099483
 7:24 pm on Sep 27, 2006 (gmt 0)

>> Does anyone here have the experience of having many pages go supplemental, then fixing the canonical issue, and your pages come back? <<

Yes. Many times. I think I first posted about this some two years ago.

If you read those posts, you'll soon realise that back then we did not realise that once you set up the correct redirects that Google continues to show the Supplemental Results for those redirected URLs for another year after the fix is put in place. So, back then there is much annoyance directed at Supplementals that will not go away.

Nowadays, with a www and non-www canonical problem, your measure of success rests solely in seeing how many www pages are indexed, and how many URL-only www pages turn into fully indexed pages.

Don't look at the non-www count; you can't control it in any way. Google drops them after a year.

WolfLover




msg:3099595
 9:01 pm on Sep 27, 2006 (gmt 0)

No 301 or any redirect need be done from the bad host since it was severed at source at your ANAME RECORDS

AG, are you saying that if you get your Aname records changed to www.example.com then it will fix all the canonical issues such as the /default.cfm or /index.html, etc?

I called my registrar and they said that since I am hosting my site with someone else the hosting provider is the one with access to the A Records. I have just put in a request to my host to change this.

Is this the correct way and will it fix the canonical issue anyway?

Does it hurt to have the A Records changed AND do a 301 Redirect as well?

g1smd




msg:3099608
 9:11 pm on Sep 27, 2006 (gmt 0)

It will only fix the www and non-www problem. I still prefer the 301 redirect, because while unwanted URLs still appear in the SERPs they still deliver visitors to the real site (via the redirect).

Separate fixes are required for all the other problems that can occur - but all the fixes are easy.

WolfLover




msg:3099633
 9:29 pm on Sep 27, 2006 (gmt 0)

It will only fix the www and non-www problem. I still prefer the 301 redirect, because while unwanted URLs still appear in the SERPs they still deliver visitors to the real site (via the redirect).
Separate fixes are required for all the other problems that can occur - but all the fixes are easy.

g1smd, I did the following searches on my site based on a thread I read that you posted to back in May.

These are my results. I am new to this type of situation and am trying to learn and understand so I can fix this mess I am in.

site:www.domain.com 1080 results The first 29 are not supplemental but everything else appears to be.

site:domain.com -inurl:www 799 results The first 11 are not supplemental.

site:domain.com 1290 results The first 41 not supplemental.

Now, that I've done this, what exactly do these results tell me aside from the fact that most of my pages are supplemental and that both www and non-www pages are indexed and in supplemental.

I have had my host do a 301 redirect as of yesterday.

I had the following:

www.domain.com
domain.com
domain.com/index.cfm
domain.com/default.cfm

They changed the domain.com and /index.cfm to redirect to www.domain.com/

They said that because they have all my home page buttons, links, etc. pointing to /default.cfm that it cannot be redirected. I have asked them to change my /default.cfm to the full url instead. Not sure if they can or are willing to do this as I'm not sure if it has to be done to every page or can be done with just one entry that fixes all pages? As I've said previously I do not have access to every single file.

By the way, is it a normal thing for an ecommerce host to not give access to .htaccess files, etc. so that I, myself can change them? Since I never realized this was an issue until now, I would have changed hosts long ago.

Or, is this is normal for ecommerce hosts that provide the template, etc.?

g1smd




msg:3099658
 9:48 pm on Sep 27, 2006 (gmt 0)

Look back at your search results again. Don't just count how many are Supplemental. Look at which ones are www URLs and which are non-www.

.

Now that you have added a site-wide non-www to www 301 redirect on your site, what should happen in the following weeks is that more and more of the www URLs will become fully indexed (i.e. show a title and a snippet). Less of the www URLs will appear as URL-only. Those www URLs that are already Supplemental make take longer to lose that status, but they eventually will.

Any non-www URLs that show as normal results, or as URL-only, will drop out of the index. The Supplemental non-www URLs will hang around for a year, but the redirect will deliver visitors to the correct URL on the site anyway. Their cache date will remain frozen during that time.

Some normal non-www URLs that disappear from the SERPs soon, may well reappear as Supplemental Results in a month or so. If they do, then that is NOT a problem. At that point they are not treated as duplicate content, and when they are clicked, the redirect will get the user to the correct page of the site anyway. Their cache will also be frozen. They will hang about for about a year and then drop out of the index.

If you edit any pages of your site, they may well show up as normal results when you search for current content, but some will also show up as Supplemental Results when you search for words that used to be on that page but are no longer there. That is, for words on a page about current events, a search for "September" may show the URL as a normal result, but a search for "July" or "April" may well show that SAME URL as a Supplemental Result, and show the word "July" or "April" in the snippet, even though that word is no longer shown in the Google cache of that page, and is no longer on the real page on your server.

That effect has long fooled many people (including me for a while). You cannot control that action. It just "is". That's how it works.

Your measure of success from here on in, is simply in how many www URLs show up as normal results, NOT in counting non-www supplementals or "historical" www supplemental results.

[edited by: g1smd at 10:02 pm (utc) on Sep. 27, 2006]

Simsi




msg:3099674
 9:56 pm on Sep 27, 2006 (gmt 0)

The Supplemental non-www will hand around for a year, but the redirect will deliver visitors to the correct URL on the site.

So when I do a "site:www.widgets.com" i get 2,090 results. When I do a "site:widgets.com" I get 2,100 resultS BUT in the latter results, the URL's also all start with "www" (as they do in the former). I put the 301 Redirect in about 6 months ago.

Is that anything to worry about or are the latter actually fixed but in the "year out" period mentioned above?

(Incidentally every result in both sets is "supplemental" I guess that IS something to worry about ;)) God why is life so frigging complicated (visualise emoticon with eyes raised to the sky). Must. Resist. Cigarettes.

[edited by: Simsi at 10:01 pm (utc) on Sep. 27, 2006]

g1smd




msg:3099697
 10:12 pm on Sep 27, 2006 (gmt 0)

site:www.domain.com - should show only www results. You want as many of these as you can to be fully indexed and not supplemental.

site:domain.com -inurl:www - should show only non-www results, but sometimes it also shows "historical" www supplemental results too (see above for what I mean hy "historical" - that's the May vs. September example).

site:domain.com - shows both www and non-www URLs and it can be difficult to untangle exactly what you are looking at.

site:www.domain.com -inurl:www - this is a "feature", but it usually shows only pages with a problem!

That site:www.domain.com -inurl:www search logically says "show me all pages from www.domain.com that do NOT have a www in the URL". This returns ZERO results from the normal index, but does proceed to return some www URLs only from the Supplemental Index.

It is an interesting test. I used it last week on a big site to find the last 20 problem URLs. Ten were old URLs that just return 404 and Google is hanging on to them for the one year. The other ten were pages that had duplicate meta descriptions, which are now all fixed.

One more thing. When doing the site search, do it with 100 results per page to get a better view. Additionally, always try it both with and without &filter=0 on the end of the Google search URL. That, too, can be very enlightening.

[edited by: g1smd at 10:14 pm (utc) on Sep. 27, 2006]

Simsi




msg:3099699
 10:14 pm on Sep 27, 2006 (gmt 0)

Thanks g1 :-)

photopassjapan




msg:3099701
 10:17 pm on Sep 27, 2006 (gmt 0)

Might be a silly question but...

What does it mean when...

site:www.example.com returns 33 results out of the 1,670 ( rest is "omitted" but only one is supplemental )...

site:example.com returns the same 33 only out of 1,840... not one is supplemental...

and site:example.com -inurl:www returns only 2 out of 850, rest omitted, and ALL are supplemental?

( i've just issued the redirects so i think i understand the difference between the total number of pages found but... what's with this third result? ) Did i get something wrong? :)

g1smd




msg:3099705
 10:21 pm on Sep 27, 2006 (gmt 0)

You posted while I was typing...

It is important to try searching both with and without &filter=0 parameter.

Without it Google removes from view some pages that it thinks are duplicates.

Adding the parameter, puts them back in view for you.

Also, use the 100 results per page option (the &num=100 add-on) to see what is going on more clearly.

You use them like this:
http://www.google.com/search?num=100&filter=0&q=site:www.domain.com

g1smd




msg:3099721
 10:35 pm on Sep 27, 2006 (gmt 0)

>> site:www.example.com returns 33 results out of the 1,670 ( rest is "omitted" but only one is supplemental )... <<

You have 33 www URLs that are good enough for the main index and fly on their own. You have more URLs than that indexed, an unknown number of which are NOT supplemental, but suffer from some sort of duplicate content penalty: most likely too similar title tags or meta descriptions. Using just the 100 results per page search, first, will show you some trends. Later on, adding the &filter=0 parameter will show you a better view.

>> site:example.com returns the same 33 only out of 1,840... not one is supplemental...

Ditto; as above.

Additionally, the extra number in this result points to (1840 minus 1670) about 70 URLs being indexed as non-www too. Those have to go! The redirect will make them supplemental then drop out in a year.

>> site:example.com -inurl:www returns only 2 out of 850, rest omitted, and ALL are supplemental?

This search is going to reveal mostly supplemental results when you "show omitted" or add &filter=0. These will be a mixture of duplicate content (non-www vs. www) supplemental results and "historical" supplemental results as per the "May" vs. "September" example above.

In all cases opening up to 100 results per page using &num=100 and adding the &filter=0 is going to give a much more clear view of what is going on.

theBear




msg:3099727
 10:50 pm on Sep 27, 2006 (gmt 0)

You also need to see if other subdomains are resolving as well.

The one that I've seen frequently resolve is the mail subdomain. You need to look at all DNS records pointing to your site in some manner.

The other big gotcha is a web server running on another port such as 443 (https) that is common on sites that sell things.

A list of "common" port assignments (yep right sure) can be had here:

[iana.org...]

Then there is the ole IP address as a valid server alias problem.

If there is one thing that the internet is, it is flexible.

WolfLover




msg:3099742
 11:13 pm on Sep 27, 2006 (gmt 0)

site:www.domain.com -inurl:www - this is a "feature", but it usually shows only pages with a problem!

g1smd, First of all, please allow me to thank you for your help here. You are very appreciated, not only by me, but I'm sure by many others who read and do not post.
This is why I come here, as there are so many nice people willing to help others. Again thank you.

As you suggested, I did the site: searches for my site again. When I do the site:www.example.com site and I get the 1080 results and I previously reported 29 results not being supplemental. This is because I saw the first top 29 results then it started showing supplementals, I checked a couple more results pages and all showed supplementals. However, when I went to look at all 10 pages (100 results per page), I see that the last 3 and a half pages are NOT supplemental results.

Also, when I do the site:example.com -inurl:www I get 799 results, mostly they are the example.com pages without the WWW, however, there are a lot of www results (as you said maybe historical, but these are product pages and no dates on them so I have no idea if they are considered historical or not). My other issue is this is also showing the https:www results. I am not understanding why the https results are coming up. Sometimes it is showing the https on pages that are not even product pages, but information or content pages.

I asked my host about redirecting the https pages to the full url pages with out the S but they said that would mess up my secure pages which are needed to protect peoples credit card information.

Does the httpS issue also look like duplicate content to Google?

theBear




msg:3099743
 11:16 pm on Sep 27, 2006 (gmt 0)

"Does the httpS issue also look like duplicate content to Google? "

Yes it is the webserver that is on port 443 responding.

g1smd




msg:3099745
 11:23 pm on Sep 27, 2006 (gmt 0)

Duplicate Content is "the same content at a different URL" (even if the other URL is just one character different) so, yes, https is duplicate content.

If the pages that really need to be https are in just one particular folder, or all have some particular standard URL format, then you could use .htaccess to selectively force a redirect, or serve a 404, for all those other URLs that should not be indexed as https.

Additionally, you could serve a different robots.txt file for port:443 stuff - there was example code posted on WebmasterWorld just a few months ago for this. That would keep the bots out of http pages except those that you DO want indexed.

If the worst comes, then you could modify the script so that every page tests what URL was actually requested, and then simply add a <meta name="robots" content="noindex"> tag to all those that are not supposed to be indexed. It would work, but would waste some internal PageRank within the site.

AlgorithmGuy




msg:3099750
 11:29 pm on Sep 27, 2006 (gmt 0)

AG, are you saying that if you get your Aname records changed to www.example.com then it will fix all the canonical issues such as the /default.cfm or /index.html, etc?

I called my registrar and they said that since I am hosting my site with someone else the hosting provider is the one with access to the A Records. I have just put in a request to my host to change this.

Is this the correct way and will it fix the canonical issue anyway?

Does it hurt to have the A Records changed AND do a 301 Redirect as well?

You can KILL the problem of www and non www through the ANAME RECORDS at source.

It becomes pointless and a useless excercise to do a 301 on your server since DNS records will bypass the server. Not a single crawler or agent will visit the server for a request for the resolved domain.

Globally, all DNS records will obey your instructions. It will then be impossible for an agent to request the wrong domain since the DNS records dictate to crawlers and angents.
.
.

[edited by: AlgorithmGuy at 11:42 pm (utc) on Sep. 27, 2006]

g1smd




msg:3099755
 11:34 pm on Sep 27, 2006 (gmt 0)

OK. So with your proposed fix, what HTTP server response code do you get for domain.com/ and for www.domain.com/ then?

This 246 message thread spans 9 pages: < < 246 ( 1 2 3 4 5 [6] 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved