homepage Welcome to WebmasterWorld Guest from 54.205.242.179
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 120 message thread spans 4 pages: < < 120 ( 1 2 [3] 4 > >     
Everything just went supplemental - Removal tool too drastic?
internetheaven




msg:3107198
 6:58 pm on Oct 3, 2006 (gmt 0)

The entire site just went supplemental. Over 90,000 pages.

Considering using the removal tool and then starting again. Can't think of a better way to do this. Supplementals can stay in there for over a year and if more than 50% of my pages in the Google index are supplemental, surely that will simply destroy any chances of ranking new pages well?

 

tucj7




msg:3113426
 10:33 pm on Oct 8, 2006 (gmt 0)

somehow, there seems to be more to it. I'm not the most experienced SEO guy around, but checking the progress of my site through google everyday has got me bothered!

My pages all have unique meta tags and titles. Content is different because every page returns a different list of products for price comparisons.

If certain niche sites get to stay indexed preferentially, surely price comparisons are in this category as they show product names, short descriptions and prices.

So, after trying and trying, somehow only 6 URLs are non-supplemental and there is no www, non-www issue as no-one (including) spiders has ever visited the non-www version. Also in SiteMaps I specified to only use the www version.

There seems to be something sinister going on, but because of my inexperience I can't put a handle on it.

tedster




msg:3113477
 11:16 pm on Oct 8, 2006 (gmt 0)

Sometimes there may be "less" to it. The site: operator has recently been giving odd results - so basing ANY action only on what you see there right now makes little sense to me.

What does make sense is something that is "less" sophisticated, "less" concerned, and more basic -- your server logs. If you are getting search traffic from Google, more or less at the pace you are used to and for the searchs and URLs that you've seen historically, then why be concerned with the current site: search results? The reason I use them is to investigate a sign of a problem in the actual SERPs and the actual traffic Google sends to a given URL.

So how does your current Google traffic look, compared to your site's historical situation? That's the place I would begin. Otherwise, in trying to "fix" something, you may actually trash a situation that has actually begun working for you. It may just be working at a low level and still needs a bit more TLC.

Here's a recent quote attributed to Matt Cutts - some real food for thought, I'd say:

having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.

tucj7




msg:3113482
 11:28 pm on Oct 8, 2006 (gmt 0)

The only traffic I receive is for the 6 pages that are no supplemental. That's the problem. This is indicated in my logs.

Where my competitor is getting decent traffic for products people are searching for, I am getting nothing.

Google spiders all pages. At the moment, it's spidering, and in the last couple of days has spidered hundreds. I have over 6000 products listed and each has it's own page. But, google still only shows 166 pages in the results (this has been for 3 months now), so not sure it the site: operator is the issue.

jexx




msg:3113582
 2:08 am on Oct 9, 2006 (gmt 0)

i have observed some of the effects mentioned in this thread (and I reserve judgement for my conclusions drawn from site:).

this is what i have noticed about supplementals:

*) keeping content on the pages updated (changed) is keeping pages from going supplemental

*) using unqiue and relevant titles (not sure about meta tags) is also helpful

*) ensuring that only a _single_ URL points to each page (i.e. prevent duplicate content)

*) sufficient internal linking structure is offsetting the need for many backlinks (as suggested by MC) to keep pages from going supplemental

obviously these are speculative and i believe that combinations of the above cause the behavior that i am seeing..

joeduck




msg:3113642
 3:39 am on Oct 9, 2006 (gmt 0)

Tedster that's really an interesting quote - are you sure it's from Matt?

It certainly makes sense that a lack of inbound links would give Google reason to make a page supplemental, although we've got tons of IBLs over many years and have had problems with going supp several times.

joeduck




msg:3113644
 3:42 am on Oct 9, 2006 (gmt 0)

Hey here it is: [mattcutts.com...]

Robert Charlton




msg:3113777
 8:01 am on Oct 9, 2006 (gmt 0)

The site: operator has recently been giving odd results - so basing ANY action only on what you see there right now makes little sense to me.

It's now not just the site: operator. I'm seeing a significant number of url-only results, without titles or caches, in serps for some of the more competitive searches I monitor. These start generally at about page 3, and including urls from some fairly well-known sites. Anybody else?

tucj7




msg:3113848
 9:44 am on Oct 9, 2006 (gmt 0)

Related to this, can someone explain why Google spiders every page on a site (now, around 10 000 pages - proven in logs) and chooses to index only 166 of them - supplemental or not?

g1smd




msg:3113904
 11:02 am on Oct 9, 2006 (gmt 0)

>> using unqiue and relevant titles (not sure about meta tags) is also helpful <<

Yes, the meta description is as important as the title. It should also be unique per-page.

DaveN




msg:3114024
 1:29 pm on Oct 9, 2006 (gmt 0)

They were pushing a new binary; the site: command is showing plenty of results for me now.

thats according to matt Cutts..

DaveN

Bennie




msg:3114038
 1:37 pm on Oct 9, 2006 (gmt 0)

Good call g1smd, couldn't agree more.

[Glad to hear your actually working too btw, all these supplemental and DC posts had me a little worried... You must not sleep very much ;-P]

/starts arrogant personal rant, rant/

I said something very similar post big daddy over in SEW and was shot down by one particular WebmasterWorld senior. It was like banging my head against a brick wall.

The biggest thing to me was 'signals of quality' setting crawl depth and frequency. Also a lot of this ties heavily back into establishing a trusted domain (read: avoiding the mythical sandbox). Most people stuck in the imaginary sandbox are too lazy to do it right - or are diluding themselves into thinking they are not spammers or 'thin affiliates' offering no quality (or signals of quality) to the index.

If your not thinking like a spammer (read: search quality engineer) DON'T BOTHER COMMENTING POST UPDATE as you will HAVE NO CLUE WHAT HAPPENED OR WHY.

Working both sides of the fence really helps. Making spam and avoiding filters you helped create shows you exactly where the filters lie. Try it - you might even make some money :)

Personally I think more white hat know alls (you know who you are) should learn a little more about the world they work in. Play a little more on the dark side and learn some more about the algo before they go touting this and that as if you work for Google.

Big Daddy certinly stired the pot well and it's nice to see the fall out *finally* being discussed in a constructive manner.

/end of little wanky rant/

g1smd




msg:3114103
 2:22 pm on Oct 9, 2006 (gmt 0)

Heh, DaveN: I saw Matts comment on your blog earlier, and almost linked to it from here...

Strange that no comment was made here, or on his own blog, or over at TW, etc.

DaveN




msg:3114135
 2:48 pm on Oct 9, 2006 (gmt 0)

I guess I'm just lucky to be in matt's rss reader .. lol

DaveN

tucj7




msg:3114226
 3:57 pm on Oct 9, 2006 (gmt 0)

So, I think I may have been an idiot... possibly.

Do you think the Last-Modified date makes a huge difference for supplemental/non-supplemental?

My site has some hectic regex rewrites that aren't supported by apache 1, so had to do a PHP rewrite workaround and in turn I was not returning Last-Modified headers.

Do you think this is why I was supplementalised?

texasville




msg:3114270
 4:38 pm on Oct 9, 2006 (gmt 0)

Tedster-
Matt's quote-
>>>>having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.<<<<<

This is what I have been saying for a couple of months now. It is going to happen to everybody. If a url on your site has no or very low ibl's from OUTSIDE your site, it is going to end up supplemental.
Get used to it. If your site has 90,000 url's (pages) and say...10 of them have quality ibl's, then the other 89,990 are going supplemental. Sooner or later. I'm convinced.

Bondings




msg:3114288
 4:58 pm on Oct 9, 2006 (gmt 0)

Get used to it. If your site has 90,000 url's (pages) and say...10 of them have quality ibl's, then the other 89,990 are going supplemental. Sooner or later. I'm convinced.

That would be a disaster. Both for the website owners, but also (more importantly) for the actual users. A very big part of the searches is about in-depth questions/problems and forums (and sometimes mailing lists) are usually the best results for these search queries. And those pages are rarely, if ever, linked from other websites. Getting rid of those would make me switch from search engine.

wiseapple




msg:3114289
 4:59 pm on Oct 9, 2006 (gmt 0)

>>> *) ensuring that only a _single_ URL points to each page (i.e. prevent duplicate content) <<<

This is an interesting point. Does anyone else think this is true?

g1smd




msg:3114325
 5:16 pm on Oct 9, 2006 (gmt 0)

>> Does anyone else think this is true? <<

Yes. It is true. Go back and read at least the last 20 000 words that I have written about "duplicate content" issues.

I think maybe you misread what was written. It doesn't mean that each page of your site should have only one incoming link to it. It means that each page of content should only have one URL that can be used to access it. A URL that differs by only a single character is a different URL and is therefore duplicate content if both URLs serve the same content and both return a "200 OK" status.

g1smd




msg:3114350
 5:28 pm on Oct 9, 2006 (gmt 0)

>> If your site has 90,000 URLs (pages) and say... 10 of them have quality IBLs, then the other 89,990 are going supplemental, sooner or later. I'm convinced. <<

I'm not. I can see a site with 50 000 pages of content, and not one is tagged as Supplemental. This has only happened after access to the other half a million alternative URLs for that same content has been blocked from being indexed. A year ago, the indexing of that site was a total mess. Now everything is indexed just fine.

The problems arise when a site has 10 000 pages and there are 50 000 URLs that can be used that will all return valid content of some sort, and do so with HTTP status "200 OK". That is "duplicate content", and that is what screws up your results. Check the previous threads (there are hundreds [google.com] of them) for what you need to do to your site to fix the problem.

FromRocky




msg:3114389
 5:49 pm on Oct 9, 2006 (gmt 0)

>>> I can see a site with 50 000 pages of content, and not one is tagged as Supplemental. <<<

g1smd,
How can you know there isn't single page with supplemental? The most Google can show you is the first 1000 pages if this site has over 1000 indexed pages in Google.

[edited by: FromRocky at 5:50 pm (utc) on Oct. 9, 2006]

g1smd




msg:3114403
 5:55 pm on Oct 9, 2006 (gmt 0)

First, a site:domain.com inurl:filename.php search breaks things down into smaller pieces, so for parts of the site with less than 1000 entries you get to see all of them, especially if you also add &num=100&filter=0 to the search URL. It is important to see how many pages get cut off by the "similar pages" filter, hence trying it both with and without the &filter=0 parameter.

Some searches can also be done using site:domain.com -inurl:filename1.php -inurl:filename2.php -inurl:filename3.php type searches too.

Secondly, site:www.domain.com -inurl:www lists Supplemental Results, even if they have a www in the URL.

FromRocky




msg:3114432
 6:14 pm on Oct 9, 2006 (gmt 0)

g1smd,
I have tried to find a site which has been listed in Google with the first 1000 pages indexed normally, without supplemental but no luck. May you or anyone here direct me to such a site?

tedster




msg:3114433
 6:15 pm on Oct 9, 2006 (gmt 0)

If your site has 90,000 url's (pages) and say...10 of them have quality ibl's, then the other 89,990 are going supplemental. Sooner or later. I'm convinced.

I seriously doubt that will happen, and if it would, then regular search results would just see lots more "Supplemental Result" tags. The Supplemental Index is not Outer Mongolia.

What I see as already important -- and when the PR of the urls involved is low, it gets more important -- is the click distance from the nearest landing page for an ibl.

tucj7




msg:3114477
 6:41 pm on Oct 9, 2006 (gmt 0)

so, any thought on my Last-Modified issue? :-)

jexx




msg:3114506
 7:06 pm on Oct 9, 2006 (gmt 0)

so, any thought on my Last-Modified issue? :-)

i'm not explicitly setting Last-Modified headers in apache 1.3 and from my experience just updating your content (spiderable read text on the page) is 1) getting freshbot to come back more frequently and 2) keeping pages from going supplemental. even pages with small # or no IBL's that get their content updated periodically (i wouldn't state frequently) seem to be ok, while others (that sometime do have more IBL's and unique title/tags but no recent changes go supplemental).

regarding inbound links.. has anyone else seen examples where extensive (e.g. that appear on every page) internal linking to deep pages that lack external IBL's has any effect? (although non-dupe, titles, updated content as i stated before could have overwhelming effect in preventing supplemental even without IBLs)

tedster




msg:3114534
 7:23 pm on Oct 9, 2006 (gmt 0)

Lets not turn this thread into a generic discussion of the entire Google algorithm. The topic was "Everything just went Suppplemental" and now that situation is gone. As DaveN said above

They were pushing a new binary; the site: command is showing plenty of results for me now. thats according to matt Cutts.

If there's anything left to discuss that's on topic, it's still welcome in the thread.

jexx




msg:3114543
 7:26 pm on Oct 9, 2006 (gmt 0)

>> If your site has 90,000 URLs (pages) and say... 10 of them have quality IBLs, then the other 89,990 are going supplemental, sooner or later. I'm convinced. <<

I'm not. I can see a site with 50 000 pages of content, and not one is tagged as Supplemental.

i agree with g1smd.. not seeing that behavior either. however, it does ultimately come down to what google's intension behind supplemental indexing is going to be..

if it's purely to remove aging and non-relevant pages from the main index, then there shouldn't be anything to worry about if you keep your pages from having duplicate URLs and maintain relevant (and sufficient amount of) updated content.

however, if their intension truly is to compact the index and only maintain pages with enough IBL's (as MC and others seem? to imply), then the majority of pages should go supplemental.

as i see it, if google really wants to keep relevancy in the SERPs then the latter approach is not sustainable since a search for "widget of specific brand X" or "review of widget Y" would return the main site (or some other page with lots of IBLs in the main index) on the sites for widget X and Y, with supplemental sublinks to the true page of interest.

most users would probably click the main link and thus search result relevancy would be reduced.
i don't think this is their intension, so i suspect (and hope) that we can still maintain large numbers of pages in the main index (without a lot of direct IBLs)

g1smd




msg:3114592
 8:23 pm on Oct 9, 2006 (gmt 0)

I think that Google wants a "clean" main index where they store "active" content pages from the main part of the web, and then everything else (pages now gone, URLs now redirecting, and myriad duplicate URLs) gets stuffed into Supplemental. Also in Supplemental are some pages from sites that have little or no trust, but those are a small minority.

The main index represents the "live" web. The Supplemental Index allows you to find content that the site owner has recently taken down, and allows you to recreate the path you took to recently find that information even if the host website has changed or gone. The Supplemental Index takes you back in time, to look at stuff no longer active. It is also a holding area for duplicated information, keeping multiple copies out of the main index.

texasville




msg:3114790
 10:48 pm on Oct 9, 2006 (gmt 0)

>>>>I'm not. I can see a site with 50 000 pages of content, and not one is tagged as Supplemental. <<<<<

Maybe not. Depends on what kind of site it is. For instance a news site might not. Or some kind of a authority site. The rest will. It is what Matt Cutts is talking about. It's why low pr sites I see are now supplemental entirely except for their index pages.
Then again...it may have the supplemental filter hit it soon.

>>>>What I see as already important -- and when the PR of the urls involved is low, it gets more important -- is the click distance from the nearest landing page for an ibl. <<<<

I see it on sites with one click away from the index page.

>>>>Also in Supplemental are some pages from sites that have little or no trust, but those are a small minority. <<<

Trust is also relative. And trust is usually set by ibl's. That's the only thing I see missing in most examples. And again, that's what MC is saying.

If I am off the mark here...I sure wish Adam would step in and set it straight. But it sure looks like what he said to the site owner is exactly what I have been saying.

g1smd




msg:3114796
 11:01 pm on Oct 9, 2006 (gmt 0)

>> Maybe not. Depends on what kind of site it is. For instance a news site might not. <<

It is a forum:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]

Site was almost all Supplemental when it exposed 750 000 URLs to search engines for 50 000 pages of indexable content. Now that the duplicate content has been deindexed by using noindex tags and redirects, there are 50 000 perfectly listed pages with one URL per page of content - and none are supplemental.

[edited by: jatar_k at 5:24 pm (utc) on Oct. 10, 2006]

texasville




msg:3114808
 11:10 pm on Oct 9, 2006 (gmt 0)

g1smd, is that really a good example? I agree with you so far as that can cause supplementals but it isn't really what I am talking about. WW has a main url with a pr8 and most subtopics are pr7.
Try looking at smaller sites with pr of 4 or less.
Look at commercial sites. Particularly sites that are static html that have never had dupe content problems and 301's in place since day 1. No canonicals. These are good examples of what I am talking about.

This 120 message thread spans 4 pages: < < 120 ( 1 2 [3] 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved