homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 105 message thread spans 4 pages: < < 105 ( 1 2 [3] 4 > >     
Filters exist - the Sandbox doesn't. How to build Trust.
Understanding factors that restore and maintain results

 4:30 am on Oct 6, 2006 (gmt 0)

A lot of discussions tend to focus on the generality of a "Sandbox", but it has long since been debunked as a useful term, by Matt Cutts and many senior forum members. So i propose the Sandbox is dead :)

What does exist, are filters.

What opposes those filters are good techniques and "trust" - one good member recently referred me to it as "Trustrank".

An understanding of what these main filters are for, how Google applies them and the observed behaviour of Google in releasing them would be a good way for owners to better manage and refine their organic search techniques.

Maybe our good friends in the community could select a topic or several, that they have some solid experience and authority in and support it with a format that can be easily referenced. The most recent one has been largely contributed to by g1smd. Allow me to paraphrase [ and please correct me ] an example of how i think this would flow:

Duplicate Content Filter - incorrect linking

Applied: when internal links are incorrectly applied to "/index.htm" , "/default.htm" when they should all point to "/"

Effect: Unlikely to be indexed, badly suppressed results , PR applied to wrong or duplicate pages.

Time to restore : 2-3 months from when fix is applied

Evidence: WebmasterWorld webmaster reports

Duplicate Content Filter - Meta Data

Applied: when meta descriptions and titles are too similar

Effect: results show supplemental and generally suppressed

Time to restore : A matter of days according to the next few crawls

How many other filters have you observed, what are their effects , what have you done to fix the problem and what have you seen is the time to restore them?



 10:25 pm on Oct 8, 2006 (gmt 0)

Tedster said:

Another proposed filter: excessive use of a keyword in internal anchor text.
Effect: domain is depressed on searches for that keyword.
Time to restore: soon after the next few crawls (the first time). But if there is a second "offense", then only a more gradual, stepwise recovery.

I agree with this. I wasn't ranking for a major word on my site for years (not above 1000 in Google). I had apparently used the keyword too many times in internal anchor text. Recently I've been so busy that I purposely destroyed my Keyword Density for that term so my rank would go down and I can have some time off. It brought my rank up to #35 for that term within a few weeks--which I didn't want. But it does show it was overoptimized because removing those extra keyword links brought the rank up.


 10:27 pm on Oct 8, 2006 (gmt 0)

How is it possible for three Google data centers (using a Google Dance tool) to show my site in the #1 position for a specific search term and yet using my local PC google search I don't show up at all?

From which country are you searching? If you're in France, for example, then you will obviously see different results to those presented by the various Keyword checking tools (many of which are US-based).

As g1smd says, you'll only know for sure by checking certain DCs manually.


 12:00 am on Oct 9, 2006 (gmt 0)


I'm in the US, Wisconsin to be exact.
I have ranked #1 for a particular key word for several years. I did a very minor change to the site and did a backup prior to making the change. I left the backup file in the root directory. I am also wondering if Googlebot picked up the index copy (named index_copy(1).html) and immediately de-listed my index.html for duplicate content. That would be rather OVER agressive and senseless.

24 hours later and I can still see my site in the #1 position using the www, www2, www3 servers in a google dance tool, but my local PC search (4 individual PC's, all refreshed) has no as not listed at all.
My server logs are also showing a reduction in traffic.

Really bizarre...


 12:09 am on Oct 9, 2006 (gmt 0)

Google would NOT have found the extra copy unless you are linking to it from somewhere else within the site.

That is NOT the problem. It is a coincidence.


 12:14 am on Oct 9, 2006 (gmt 0)

Here is a sample of what I'm finding. Keep in mind, I am not showing up in my local search, but I am showing up in the #1 position using a DC quick check tool (mcdar)

datacenter - position - 1 - 1 - 1 - 1

Doing a tracert from my PC on google.com shows me the following IP, I am ranking #1 there too...but again, typing google.com into my browser and searching again shows me nada for my site.

Doing a ping to google .com resolves to, where I am #1 again...enter www.google.com via IE and again, no show.

I suspect something is refreshing somewhere...


 12:37 am on Oct 9, 2006 (gmt 0)

Try using the gfe-xx names instead of IP addresses to do your test.


 12:45 am on Oct 9, 2006 (gmt 0)

I'm not excatly familiar with that format.
I assume in gfe-xx.google.com that I substitute the xx with a numerical value...any hints where to start? I tried a few random choices with no luck.


 12:48 am on Oct 9, 2006 (gmt 0)



 1:22 am on Oct 9, 2006 (gmt 0)


Thanks, I know I had seen that post recently, now it all makes some sense.

A brief search found that most of the dc's show me in #1. Only gfe-dc is different, so that must correspond the what my browser it currently picking up. Pretty complictaed stuff this google is.


 10:32 am on Oct 9, 2006 (gmt 0)

Using any of the Mozilla, Firefox, or Seamonkey web browsers with the ShowIP extension installed (L4X.org) is also a revelation.


 11:41 am on Oct 9, 2006 (gmt 0)

Okay so, Tedster, and everyone, What's to be considered extensive?
I mean for the use of anchor text in internal linking.
For example our navigation throughout the site is something like this ( only less made up :P ):

Widget picture guide / Widget article guide / Top articles
Search / Forum / About

( ...notice there no more widgets, but... )
...then at the bottom navigation:

Widgets by color / Widgets by age / Find the widget for you
Editor's picks / About

All links go to a different page by the way. And also...
We aren't selling widgets just showing pictures of them. :P


 3:00 pm on Oct 9, 2006 (gmt 0)

Okay, I have reduced the amount of cross linking with anchored text between pages on my 2 sites. In one case, it meant removing cross links on 20 pages. In the other, removing it on 15. Let's see if it makes any difference. I just can't rank for blue widgits but I rank about 4th for red widgits on my other site. I rank #1 for blue widgits mycity and have been in that position for ever.


 3:23 pm on Oct 9, 2006 (gmt 0)


You write the following:

"Duplicate Content Filter - incorrect linking

Applied: when internal links are incorrectly applied to "/index.htm" , "/default.htm" when they should all point to "/"

Effect: Unlikely to be indexed, badly suppressed results , PR applied to wrong or duplicate pages.

Time to restore : 2-3 months from when fix is applied

Evidence: WebmasterWorld webmaster reports "

Let me get this right...
If I use an internal link from a sub page to return to the home page that is formated like this: www.mydomain.com/index.html
that will result in a penalty?

Am I to understand the preferred method is to send them back to the root using either "/" or "http://www.mydomain.com/"?



 3:45 pm on Oct 9, 2006 (gmt 0)

Okay so, Tedster, and everyone, What's to be considered extensive? I mean for the use of anchor text in internal linking.

"Extensive" or "excessive"? There's a difference. What's more, the definition of "excessive" is likely to be different for a site that's squeaky-clean in other respects than for a site that has "SEO" stamped all over it. Context counts.

Remember, too, that Google favors the "natural" over the artificial. If you have a travel site of 10,000 pages about Elbonia, it's perfectly natural to have navigation links for "North Elbonia," "South Elbonia," "Elbonia City," and other major topics on every page. But if you've got tiny-text navigation links to the 100 largest towns in Elbonia at the bottom of every page (or, worse yet, tiny-text links to "[townname] hotels" for those 100 towns at the bottom of every page), those pages are likely to fit a spam profile.


 4:05 pm on Oct 9, 2006 (gmt 0)

>> If I use an internal link from a sub-page to return to the home page that is formatted like this: www.mydomain.com/index.html that will result in a penalty? <<

Not a penalty as such but it works like this:

You are sending all your internal PageRank to /index.html and most external sites are voting for / instead. Google also often prefers to list / as the canonical URL. You would be better advised to link to / for the root index page and for any index file in a folder. That is, always omit the actual index file filename from the link itself. To do otherwise is to have "duplicate content" for your index pages.


You can use a redirect to fix this. The check for index pages should also force the domain to the www version in the rewrite, and the index check should be both domain insensitive (working for both www and non-www index pages), and should occur before any check for non-www URLs:

RewriteEngine on

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html? [NC]
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.domain.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

First, this forces all index pages, both index.html and index.htm to / for both non-www and www, and forces them all to be on www. The redirect works for index pages both in the root and in any folders, and the 301 redirect preserves the folder name in the redirect.

Secondly, for all pages that are on non-www the other 301 redirect forces the domain to be www. This second directive is never used by index pages as the first directive will have already converted all of them.


 4:48 pm on Oct 9, 2006 (gmt 0)

Um... thanks europeforvisitors...
Yeah i meant excessive.
Also your answer pretty much covers what i was curious about.

So if i use a normal amount of internal links on every page ( intended as navigation as opposed to half-a-directory on each ) and i don't touch sensitive ( competitive ) stuff that screams SEO and which aren't even relevant to our content... we should be allright...

That's a relief :)


 5:22 pm on Oct 9, 2006 (gmt 0)

"He suggested no more than 5000 pages per week"
What about the large news sites such as CNN and BBC that may do more than 5000 per week

Well known and very useful information on Google .... they don't fit with Google's profile of potential SPAM sites, and if sites like this fall into a filter - they'll have someone permit them into the results'd think.

I believe Matt was referring to lesser profiled sites.

My sites can add pages at this rate; we commonly do not have problems with growth of this level. But I believe that the following factors prevent problems.

1). My sites have been around for years
2). I have more link popularity than most sites my size

Now when I launch a brand new version of my site, say in a new country with different content these rules are change. The link popularity is still growing at an extremely large rate due to our PR efforts.

My infrastructure prevents an easy launch of a new site without shipping off a ton of pages at the same time. Thus a growth rate of less than 5k is almost impossible.

I do strongly AGREE with Google on why they do this. Most sites that grow at this rate are SPAM. I personally believe that there are less than half a percent that would be penalized for this that shouldn’t be.

If the Matt’s Search Quality Team feels that this percentage rate is too high, then they will release a tool for people to submit their sites, most likely built into the re-inclusion request system.


 10:31 pm on Oct 9, 2006 (gmt 0)

Large site launches

Applied: when too many pages [ more than 5000 per week ] are launched at the same time.

I'd put this in the same pot :

Large Scale Page Ammendments

I think you can class large wholesale page changes into the same category, although different in their nature:

Application: Changing URL's from underscores [_] to hypens [ "-" ]
Effect: Google sees these as large scale new page additions and flags the pages. Filters applied. Results may be suppressed. Pages may not be indexed or be very slow to index.
Evidence: Need feedback to support this hunch

Application: Changing site with multiple index paths e.g. "index.htm" to "/"
Effect: Google put's site back through a re index process and filters pages as if it were a new site.
Evidence:No evidence yet, but initial reports suggest this might be the process. Needs feedback to support this

How to release the filters

Per the above observation
If the Matt’s Search Quality Team feels that this percentage rate is too high, then they will release a tool for people to submit their sites, most likely built into the re-inclusion request system.

Wait [ possibly for ever ] or send a reinclusion request.

I don't know how successful this will be for the average webmaster to get Google's attention.

[edited by: Whitey at 10:47 pm (utc) on Oct. 9, 2006]


 10:46 pm on Oct 9, 2006 (gmt 0)

What timescale are you talking about?

For index page changes on a small to medium sized site, I would expect Google to have picked things up and followed them within about six weeks of making the changes to the site.

That's a hunch.


 11:01 pm on Oct 9, 2006 (gmt 0)

What timescale are you talking about?

We're at the point of 6-8 weeks. One site is mostly re indexed [ estimate 85% ] , but the results are suppressed.

The other sites are still indexing only fully down to the 2nd level. Some pages are down to level 3/4. Pages without links to them at level 2 are not caching. Front pages are PR 5/6.

No pages on any sites rank, even for very weak or unique search terms, except with of course, exact match.


 1:20 am on Oct 10, 2006 (gmt 0)

I noticed StarryEyed over here [webmasterworld.com...] say's her site fully re indexed and ranked in 3 weeks, although from memory it's less than a 1000 pages and no additional pages were added.

Her belief was that increasing the priority on the XML


<?xml version="1.0" encoding="UTF-8"?>
< urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
< url>
< loc>http://www.example.com/</loc>
< lastmod>2005-01-01</lastmod>
< changefreq>monthly</changefreq>
< priority>0.8</priority>

would speed things up. Has anyone else seen this so that we can better analyse and validate [ or expel ] the theory potential for a filter effect on newly updated pages?

How is this XML adjusted to accelerate the crawl through all level of the site following a change?

[edited by: Whitey at 1:36 am (utc) on Oct. 10, 2006]


 8:44 am on Oct 10, 2006 (gmt 0)

yes all reindexed and all ranked really cool.

All total 8000 Pages are indexed but it needed 1 month to this point.

First it was an up and down and Google got only a few sometimes around 1000 then after some days later they gone. After 2 weeks Google got around 2000 and then after some days all where gone.

And so on. I put always some hundred pages every 2 days. Now Google has all 8000 Pages indexed. No Suplemental and no Omitted Pages.

And all 8000 are really similar but only for a Bot - but not for a Human. Big Navigation about 30 Links and some pages with over 300 internal Links on it.

best regards


 9:54 am on Oct 10, 2006 (gmt 0)

"/" fix does not in itself appear to trigger a filter

g1smd - looks like your "hunch" is strengthened by Reilly and StarryEyed's confirmations of around 3-4 weeks to fully index.

I also noticed that these were small to moderate sized sites and that no pages were added.

So this alone does not appear to be grounds for a filter to be triggered, based on these 2 examples.

It would be good if we were able to observe the behaviour on larger sites also.

Adding a "large proportion" of new web pages likely to trigger filter

A filter does appear to have been triggered on Monster88's site which both grew approx 30% of it's pages at the same time the "/" fix was applied. However, it looks unlikely that the "/" fix caused any problems on this site which is small to medium.

See [webmasterworld.com...]

My gut feel is that the Google trigger threshold was affected by the proportion of new pages added to the index, and several senior members seem to have the same view:

See: [webmasterworld.com...]

Having a long web presence gives history and secure's TRUST - filter unlikely to be applied for adding pages

IncrediBILL - My site which just got another 30K page injection is, and always has been, monetized, so perhaps it's just status quo of a 10 year old site letting me get away with it.

See: [webmasterworld.com...]

This supports ashear's findings above relating to large site launches where TRUST has been established.

It's also worth reading Caveman's fuller version over on the same above thread in the context of the risk of filters . Here's an extract:

As pertains to this thread:

High Risk
- Adding a large number of new pages, relative to the existing number of pages.
- Adding a large percent of new pages, relative to prior growth rates of the site.
- Adding a large number of thin affiliate pages to the site (threshold for problems lower than for either of the two points above, IMO).
- Adding feeds.

Low Risk
- Adding new pages at a rate generally consistent with the site's history.
- Adding monetization vehicles to existing pages.

- Fixing multiple link formats internally to the homepage by making all internal links consistent.
- Consolidating the non-canonical homepage URL's (e.g., "index.htm") into the selected canonical version of the homepage (e.g., "/"), via 301 redirects (with g1smd's caveat from this thread that you not create redirect chains).

[edited by: Whitey at 10:12 am (utc) on Oct. 10, 2006]


 11:19 am on Oct 10, 2006 (gmt 0)

Can I ask a stupid question?

What kind of site would require 5000+ pages to be added daily.

Without knowing your content, ie: large supplier of small items, it would seem like blackhat & serp hogging to require and expect that many pages to be indexed.


 11:37 am on Oct 10, 2006 (gmt 0)

"What kind of site would require 5000+ pages to be added daily."
what else -wikipedia-
5000 webtrolls adding daily new pages in wiki with that simple tag


 12:32 pm on Oct 10, 2006 (gmt 0)

Yeah, but if the topics are not valid, then it's still spam, and should be filtered out.


 11:37 pm on Oct 10, 2006 (gmt 0)

I just picked this up from RichTC over at


Im working on sites in commercial areas that are still in the sand almost 18mths later. Backlinks, good backlinks, unique content and time is what is required.

How much work? Why is it taking so long and are you confident that you will release the suppression filters?

No disrespect - [ 'cos a lot of us have the same issues ] , but that's way too long to be out of action. Were these the only issues?


 12:07 am on Oct 11, 2006 (gmt 0)

Hi all,

Just sticking my beak in on this thread

I agree with tedster ref "I also think there is a "link aging" filter where the effect of some marginally trusted links is only allowed to influence search results gradually" I have certainly found this to be the case.

I firmly believe that diferent filters apply depending on the sector your in and how much demand their is in it for adwords. If your sites in a top commercial sector and your prime keywords are in high demand a new site just isnt going to cut it for some time not without:-

a) a serious amount of specific unique content relevent to the particular keyword

b) Aged backlinks from "Valued" sites

c) Strong link popularity from a volume of sites over a long period of time.

Its also possible that Google looks at how quickly other "authority" sites have gained links in the sector as a bench mark? and i have also see even large authority sites PR8 fall back in the serps due to having to many backlinks to quick that are the same on many sites (ie in blocks of 5/6 links)hence google thinks they are paid for links and again applies some other filer.

Its all guess work, but certainly a webmaster in a low adword, non commercial sector will not be faced with the same filter issues as a webmaster in a high demand adword / commercial sector - thats for sure!


 12:17 am on Oct 11, 2006 (gmt 0)


In addition to your points i also observe that where the site carries affiliate links a filter applies depending on the percentage of pages with/ without affiliate links.

One site we worked on had about 50,000 pages, they added affiliate links to about 80% of the site and the positions in the serps fell back. They then cut the affiliate links from about 80% of pages down to 20% and they started recovering in the serps.

Conclusion, google wants unique content thats not all carrying affiliate adverts if its to rank in high in the serps within commercial sectors


 12:24 am on Oct 11, 2006 (gmt 0)

Sounds like a dupe content filter to me. How are you linking to your affiliate?


 12:32 am on Oct 11, 2006 (gmt 0)

Ref demand for adwords:-

Lets say a commercial site about widgets has sectors red, white and blue widgets etc.

In addition to "widgets" being a highly sought after keyword, ie loads of adwords it might find that "Red widget", "White Widget" and "Blue Widget" are also high adwords it may even find that "Blue Pink special widgets" also attracts loads of pay pr click interest.

In this situation i find that obviously its going to take a good period to rank for the prime kewords, but the semi prime ones are still going to take ages to rank and need lots of aged backlinks.

Even if the site is an authority on the widgets and has been for a few years but adds a new sector page about "blue widgets" ie a semi prime keyword related but not covered befors some sort of filter kicks in to ensure that the new sector pages wont rush in top ten untill some age process has been involved.

If you look at the serps in high adword areas you will find that the top ten sites listed will have had that page in google for some time. IE 18 mths plus.

Also, since the new google infastructure rolled out you dont see new sites spring up with high page rank even if they are well connected, like PR7 for example. I would say that for a site to get PR6 upwards now it needs aged backlinks on its side

This 105 message thread spans 4 pages: < < 105 ( 1 2 [3] 4 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved