Welcome to WebmasterWorld Guest from 188.8.131.52
What does exist, are filters.
What opposes those filters are good techniques and "trust" - one good member recently referred me to it as "Trustrank".
An understanding of what these main filters are for, how Google applies them and the observed behaviour of Google in releasing them would be a good way for owners to better manage and refine their organic search techniques.
Maybe our good friends in the community could select a topic or several, that they have some solid experience and authority in and support it with a format that can be easily referenced. The most recent one has been largely contributed to by g1smd. Allow me to paraphrase [ and please correct me ] an example of how i think this would flow:
Duplicate Content Filter - incorrect linking
Applied: when internal links are incorrectly applied to "/index.htm" , "/default.htm" when they should all point to "/"
Effect: Unlikely to be indexed, badly suppressed results , PR applied to wrong or duplicate pages.
Time to restore : 2-3 months from when fix is applied
Evidence: WebmasterWorld webmaster reports
Duplicate Content Filter - Meta Data
Applied: when meta descriptions and titles are too similar
Effect: results show supplemental and generally suppressed
Time to restore : A matter of days according to the next few crawls
How many other filters have you observed, what are their effects , what have you done to fix the problem and what have you seen is the time to restore them?
No it isn't. Do we really need another silly thread redefining that stupid word?
The sandbox is a new site filter, the details of which aren't important here.
The sandbox hasn't the slightest thing to do with this other stuff. Jeez, it's like blaming Nixon for everything.
since you're so sure, prove it! Not even google uses that word. Maybe new sites get caught in the "filters" much more than others making this seems as a new site thing; I know it effects OLD sites as well. Sites that have been online 1997 too.
The sandbox is a new site filter
Originally, it was intended to describe sites that didn't rank in the results on release
The definition's old hat - agreed. But filter effects still exist.
What we need are facts that folks can use to determin the filters being applied to their sites, with a technical and validation foundation, irrespective of their lifecycle of their introduction point [ new or mature sites ].
Filters on new sites may be one, some or many may reoccur in the life of a website.
I don't mean to be focusing or arguing on "Words" "Titles" or "Badges" - sorry if it seems that way. Just how to identify and fix the filters - this is my focus - I think folks need to know this.
Any help and involvement would be gratefully received :)
This post sounds familiar. ;-)
Let's break the habit of arguments and lack of substantiated comments [ i mean it nicely ], and focus on the filters.
Put your hands up if you are willing to do some tests on some these filters over at [webmasterworld.com...]
and hopefully the result can be applied back here
May i suggest anyone with test results could greatly contribute to your fellow WebmasterWorld member colleagues.
And may you become an esteemed member , much loved by your colleagues for ever :)
[edited by: Whitey at 10:55 pm (utc) on Oct. 6, 2006]
so if I link to nasa, MIT, google, Yahoo, the whitehouse, the Senate, I will get good trust? I don't think it is so easy :)
Snore, same goofy nonsense years later.
That stupid term has been a pox on webmaster forums since it "won" and became a armwaving term here.
Google has lots of filters. One makes it easier to rank pages on established domains than new ones. They got a bunch of other ones too. Get over it.
i have never had a new site that landed in the "sandbox". on the other hand i have experienced drop in rank with established sites due to changes in the algo.
yes google has filters as does yahoo and msn, but google did not invent the sandbox filter...we did.
What about all the blog sites that automatically put the (long) title of a blog into the file name? I know you said "folder name" but wanted to see if you are being precise and sure of your facts. There are blogs that create their articles as folder names too, and I thought Google was so in love with blogs that it would do nothing to hurt any of them.
This, BTW, is what I see and think in comparison with forums and the Google's lost love for many of them. Certain forums which produce an individual html page for each posting / reply (rather than a threaded forum) suddenly are unloved by Google (unindexed and PR0), much like a link exchange page is treated. I am not talking about vastly spammed forums (which may attract another kind of filter), as many forums are, I am talking about well managed, spam free forums with unique titles, descriptions, etc. based on the post made by a real person, which used to quite rightly get indexed and appear for fairly long unique search inquiries, now being toast. I have such a forum myself and seen the same of other forums of this type. My domain is old and trusted and has good PR / rankings on its other (non message) pages also, so it seems Google have a filter that affects the individual messages on the site. FYI, I also only keep messages (archive them) for 12 months, as things change so old information no longer applies, so I don't have an ever growing message base. I perform good housekeeping, e.g. when I get rid of archives (archived namedate.html for the month index and /namedate/* for the messages on that page - these therefore both stay the same for their 12 month archive), I put a "gone" for those in the htaccess file. I even have a custom "page not found" page.
I am convinced Google filters out this forum style's individual html messages despite everything I have done and do, so now I am looking into replacing / changing the forum scripts with something that does "threaded" discussions like this webmasterworld; BTW, it less appropriate for my forum's subjects to have threaded messages as the threads tend to be very short, but the number of new messages (comparatively / percentage wise) are much greater than say here. I am annoyed because I am, I feel, being forced to change something that works well and is wanted by the users (I set up a separate Invision threaded forum and blog a few months back before this happened and asked registered users of my forums which they preferred, the existing or new, and got a resounding "old") but now find myself having to change and inflict this on the users because Google has cheated on forums and run off with some young blog! Anyway, my anger aside, I am certain the messages are no longer indexed and / or given PR is because of a filter, despite the fact my forums date back to 1998.
they have been doing that many years so they are addign with the same rate of increase (more or less). Problems arise when all of the sudden your 50 page non-commercial site adds some 127,000 new commercial ones overnight.
There is a fair amount of slack allowed, such as a page a day blogger having some days of 20 pages, but 200 pages becomes suspicious. A site that adds 100 pages a day would have no problems with a day that jumps to 200. Because that is still within a range that would not seems suspicious.
You can even turn your page a week site into a 200 page a day site with a few months of working your way up. But you also need to understand that frequency of site updates will also be considered when categorizing your site. I'm not saying that it will be better or worse, just different.
I don't know if this is something we have evidence of or if it's just speculation. Hopefully someone can clarify this.
In an earlier discussion on this topic someone suggested you build trust by regularly adding to your site. A huge site that just pops up out there and then never changes is probably less trustworthy.
I'm skeptical of that, because some highly trustworthy sites wouldn't be expected to change very often (e.g., the collected sonnets of William Shakespeare or lists of Catholic saints).
"He suggested no more than 5000 pages per week"
What about the large news sites such as CNN and BBC that may do more than 5000 per week
Well known and very useful information on Google .... they don't fit with Google's profile of potential SPAM sites, and if sites like this fall into a filter - they'll have someone permit them into the results'd think.
I believe Matt was referring to lesser profiled sites.
Here's the complete case study... "the story of launching our site" by photopassjapan. I beg for the mods to delete this is if its length or non-techie enthusiasm is more of a laughingstock than help.
- We register the domain name. Site isn't ready thus kept offline.
- Google says so that you shouldn't upload something that isn't complete, right? :P
- Domain gets spidered through a #*$!raper site...
- You can imagine. If there's an absolute 0 for trustrank i think this is it.
- U.S. server, with a single admin page under our domain that self-advertises the software which the host is using as the web interface.
- Under these circumstances... we upload a site... with 7000+ static htmls.
- With duplicate problems. Unique content, unique title on all pages... these are album pages for photos. However meta description is only unique to sections, not each and every HTML file.
- And finally... at first we even mentioned ( why not, although we knew we weren't close to be serving this quality )... we mentioned a word that would indicate we would be willing to sell/license. ( we don't do, but thought if someone would want us to, why not. Stupid move. )
Let me note that at this point we knew nothing about the fact that we were spidered, that we were uploading a site with potentially dupe content problems. The only gut feeling we had was "Why pose as if the *** was for sale, when they're not, really? Just for that one in a million chance? What are we, artists?"
- Site got indexed within two weeks gradually.
- It was in the index, but didn't come up for relevant searches.
- Okay, that the "primary index" thing doesn't list us, but the "secondary"? I mean for a completely random term that is nowhere else on the net but on our site, and our pages indexed, it still wouldn't show... so...
- Site has no Gtrust... nada... nil... shinyou zero. :(
- It's liked by people who knew about it... for its content but...
- Looking back, we could only get it known by means G won't recognize as indication of trust. Like members only communities, friends, interested people, word of mouth. A few backlinks from here and there, and we were very happy :)
- But... even with adding backlinks, nothing happened.
- We figured, and now i KNOW that the BLs were there, but held back. Not from displaying in the link: operator, but from functioning as a vote for us. We figured... and now this seems to be the case... they were held back for a definite time based on the reasons why G didn't trust our domain.
Once we knew something was wrong:
- Tried to contact Google to at least neutralize its stance.
- Nothing... heh. Would you have guessed?
- Asked around what to do about this PR0 thing. I mean we didn't call it "lack of trust", but having a decided PR0 pre-launch and a cached irrelevant admin page certainly indicated something may be wrong by now :P
- Nothing... either no one had this kind of problem, or sitting it out wasn't enough of a deal to be whining about it. I regret that we did. I mean whine.
- Removed all references, words that we felt discontent with from the start. We thought that if they were there, the site would need to be competing with industries we don't belong to, yet which are heavily over represented on the net. Not a chance to be seen for what we really would be liked to seen for... ever.
- Came here and asked around what the causes... or the "other" causes for ranking so bad could be. I mean we're doing something, and in a way no one really did before on the net... a narrow but rather unique way. All pages indexed. What's wrong?
- Duplicate issues, and canonical issues came up.
- Fixed them... ( thanks g1smd again )
- Didn't have high hopes anymore to a single such parameter being taken care of but... they're all necessary to deal with anyway.
So, the cause and effect theory, chronologically:
- Site gets spidered through #*$!raper. ( May )
- Uploading the first version of the site ( 12th of June )
- Noticed admin page in the G cache, PR0. ( same day )
- People visit the site from everywhere but G, and they like it
- Some links already point to it, relevant and trusted alike ( June )
- No BL update, no PR update throughout June, July, Aug... not for us.
- Adding new pages we would have liked to anyways...
- Being intrigued, we check why we don't deserve an update. And notice the first thing on this list in the logs. Our first visitor, and the referring source.
- Ask around what could be done ( July, Aug )
- Removing all references, words, phrases we don't like, don't feel to describe us well ( or make us look like a commercial site / competition to a commercial site... well, whatever. )
- Thus uploading all 7000+ html pages again ;)
- They get indexed =)
- Nothing changes :I
- Adding new pages we would have liked to anyways...
- Adding a sitemap to Google SiteMaps and even Yahoo.
- But nothing really.
- Site is getting a dozen more people. Wow. ;)
- Coming here and ask around. ( Sept )
- Dupe content problem mentioned. Not yet fixed!
- NO other problems. Except of course the facts already mentioned.
- Before it got fixed, we noticed that the first BLs start to show up for link: ( 12th of Sept... exactly 3 months after the site was uploaded... is this a coincidence? I don't think so. )
- On the 15th of Sept everyone had something to say on here thou.
- Almost 50 people from G a day -.-
- Getting rid of phpBB because no one visits the site anyway.
- More importantly its pages were all over our results... as opposed to our content... gnn... so off it goes. First disallow in robots.txt ( nothing happened, not even supplemental ) then DELETE button ( ok, now it's supplemental ) with a new index and a redirect. Even later everything deleted. But the 404 page is customised and looks real nice. 404 not 301. ( historic supplemental, i.e. invisible )
- Duplicate content issue fixed gradually.
- Canonical issues discussed... those get fixed right away.
- Pages that are indexed with their new unique stupid meta description... which seems to count more than the 3-4 months spent to make the site and all pages unique... start making them pages show up individually for the site: operator.
- Indexing becomes more and more dependant on site structure... before it was as if GBot was reading it in alphabetical order...
- SERPs start picking us up for terms our site is more relevant with than others, and which are not that $$$ competitive...which is nice, but still nowhere for most relevant things.
- PR update ... er... should I say TBPR update starting from 27th of Sept and still going on.
- TBPR is now 5.
- Some BLs are showing, some don't.
- We didn't notice any change in traffic as of yet, the real PR change must have been ( for us at least ) months ago. Still not that visible, but at least all searches are relevant! I mean some people who arrive from G look though ten, some look through a hundred photos... they seem to like the site.
- Since the meta description issue solved, we see more and more pages updated in the index by day. First it was only about 40, the subpages. Then slowly but steadily it rose to about a hundred. Within days it was 400.
- Pages we added after Aug do not have a PR. The rest is divided as per site structure.
There was no SINGLE thing that got us out of trouble.
The sandbox effect in my opinion consists of as many parameters as there are factors in ranking... and is fading away gradually, not in a single turn. Thus it's not a filter, rather filters multiplying each other's effect. At least that's what i think.
Relevancy issues, trust, backlinks, age of domain, dupe content issues... they all add up and multiply the length of not ranking at all.
Our experience was that there just wasn't an exact date when this stopped or won't be a date when this will stop being an issue for us... neither of the above mentioned great turns in changes visible to us brought a visible bang in traffic. Yet it has been slowly but steadily crawling up... or at least that's what i can tell by examining these very low numbers :P
The fact that exactly after the third month of the new domain root index being cached i saw BLs updated in the link: operator... the fact that there are 8 times as many pages indexed properly than of two weeks ago... the fact that we now have TBPR 5 not 0 for one and a half weeks now...
Neither date shows a turn in the stats.
But the stats do show an increase in general.
Filters(?) we might have been tripping
- Had the site not been new, the link from the #*$!raper ( only one at that time ) may have not hurt it. Don't ask, we know. Checked into this with other sites, older, newer... everything seems to be indicating this is right. It may be wrong thou.
- Had the site been online before it was spidered, it wasn't cached with a completely irrelevant page in its place for the VERY first time G has ever saw it.
- Why this is important: I think that if G spiders you with stuff that is competitive, AND irrelevant to the site, that's an instant flag. Either way... SiteMaps is the proof for me that G has at least a top 20 for each page it considers its advisory board for each URL. Regarding what it is relevant to, regardless of other sites, based only on its content.
- In our case it was either that the new index.html for the domain root, you know, the ONLY page indexed at that time... was replaced with an irrelevant page ( as in the originally spidered "admin panel" was not filtered out, but when it was replaced with another, 0% relevant page, THAT triggered something... regardless that the new page was meant to be the one and only )... or that the page indexed on our domain as the first impression was irrelevant already, compared to the domain name and even the #*$!raper link.
- If the site was not so low on trust, adding the 7000 pages may still have been too much. I don't know.
- If you have unintentionally competitive keywords and phrases on the site, you'll be considered as a "new contestant wannabe". Do a search on some very general phrases...
You'll see the SERPs being cut off at 300 results saying the rest is "very similar" with umpteen million results being filtered out leaving only a bunch of sites. The rest won't show up if you click the omitted results page. Indeed there is a big league. In which you won't compete until you graduate from the university team matches ;)
So there may be different thresholds for different key phrases.
Based on just how many sites of great trust, age, relevancy compete in the area. Or based on G's experiences on how competitive an area is, just by looking at AdWords data?
- Duplicate content may have hurt the indexing and the SERPs alike from the very start. The indexing is now visibly better. Traffic didn't see much of a change as of yet, but who knows, maybe this was an obstacle out of many.
Okay i think i added another meaningless, long post which i won't get any reactions to and feel bad about typing it for an entire hour :P
Lucky for me i have time on my side.
If you couldn't tell that from the looks of our web site, this is made for and by us, not anyone else, thus we don't have anyone but ourselves to report to.
[edited by: tedster at 4:03 pm (utc) on Oct. 8, 2006]
Well, within 24 hours, one of my top listings for that page was gone!
It was an honest mistake, but would googlebot actually find this and consider it duplicate content then slam me that fast?
My site is about 7 years old. My top competitor uses the same keywords and description for every page of his site and has a rock solid position and seems like he can do no wrong.
Does google use certain sites as a keystone? or reference point?
To make this stranger yet, if I search using the Google Dance tool, I still show the index page as #1 across three data centers. Searching Google however shows nothing. Can someone explain why?
You may be right, anyway i don't know.
This thread isn't about my site but the filters.
If you mean this the way that my site had a major factor overruling all else, i.e. the site being new makes our experiences with the filters relatively useless... that's up for question. May be so, i have no idea, but comparing to my previous experiences, i'm not sure if this is the case.
It may look like it though, but...
this isn't my first ever site. ;)
I had other sites, all of which i built to the best of my knowledge, just like this. And never had this happening. The last one was made in 2004 so the only thing in your comment i know i can't agree with is the "pre-sandbox" remark, the rest may very well be so. There were some major changes since then which i guess i should have known about, but i'm not a webmaster. I come here for advice actually :P
The sites never took more than about 40 days to catch up to their average potential before.
All were personal so always in different areas. One even in an area waaaaay worse regarding competition. Only one though, for altogether i like to make sites that offer something people would like to see/know/make use of, but can't find it on the net. Can't stand competition for i don't have the budget, nerves and proper off-site SEO knowledge apart of... common sense.
That's why i know that if you build a good site, with no errors ( webmastering or seo errors that is ), it does rank well within a month or two. And not three to six. Well, of course not for the competitive searches, but for those which it's unique to. The rest... i mean how well it ranks for big time keyphrases, well yeah. It may be more difficult now, more than ever. But i wasn't really aiming for that.
My point was that it may be that our "trustrank", or the 7000+ pages all at once, or the duplicate content issues... or more precisely all this together... kept and is still somewhat keeping us from being seen. Not for competitive searches, but for the rest. Took care all of them though, one by one.
You know we're talking about Google here, on MSN and Yahoo it does rank better. Even some very neat keywords became top 20. Only that no one uses them on MSN or Yahoo :P
Even we checked only because we didn't get what was wrong.
There are things i didn't count for, for i didn't know about them. We built these pages in accordance of past experiences which worked well before... and out of what i learned since its launch, these few but fearsome things could have been the "filters" resulting in a much slower progress in going public.
I don't know. These are all "maybe"s, as every single theory on filters are, right? This is just our experiences, summed up.
But if the only problem was that it was a new site, well then...
- You can add 7000+ pages at once
- Can be found for relevant searches with a TBPR of 0
- Can get links from others count as a vote for relevancy, with zero trust from Gtowards you
- ...Unless... it's a startup. Startups can't.
Or...( sounds more accurate to me ):
- You can add 7000+ pages at once and will be subject to attention by the algo that filters spam based on relevancy, trust, link structure, whatnot.
- Can be found for relevant searches with a TBPR of 0 for it doesn't mean it's accurate as your PR, and is only a single factor, perhaps the other factors, your inbound links have already promoted you as the most trustworthy TBPR 0 page for relevant keyphrases... bypassing less relevant, less trustworthy but TBPR 19+ sites.
- Can get links from others count as a vote for relevancy, even with zero trust from G towards you, becasue that's how you regain the trust...
- ...BUT if you're a new site, you have a huge handicap at all of these. Which may result in a longer period of evaluating the new pages than on a well established site.
I think we can agree on the second being closer to what we experience.
We may not know for sure though.
I guess you meant our BIGGEST problem was that it was a new site. Yeah, and i state it so in my previous post... but... our next biggest were probably... in no particular order... too many pages out of the blue, competitive wording, tbpr0 because only a single low trust link pointed to us when spidered for the first time, a sudden change in relevant words when that page was overwritten with the actual site, and dupe content. These i believe multiply each other's effect.
My previously launched sites didn't have these problems:
You can't get backlinks to vote for you if you have no trust. ( You can but it's slower )
You can't get trust without backlinks. ( You can but it's slower )
You can't get relevant with no trust and no backlinks. ( You can but it's slower )
Your data won't be updated as often as others' inc. trust, BLs, PR... and so on.
I'm not an Alice Cooper fan but this sounds familiar ;)