Does Google Ban or Filter Web Directories?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does Google Ban or Filter Web Directories?

moftary

1:06 pm on Jul 28, 2005 (gmt 0)

I think the subject worth a thread itself. It's a suspision so far. Yet I don't see dmoz, yahoo nor any major web directory were banned/filter nor PRed zero as my web directory did. I tried to check it in Alexa (powered by google) and I see some results from my site. Appearently, Alexa brings old results from Google but something weird is that Alexa itself has PR0 now. But that's another story!

If you run a web directory, feel free to post your experience here.

webdude

12:11 pm on Jul 29, 2005 (gmt 0)

By the way. I do not run adsense on this site. It is strictly non-profit and there as a free service.

zoltan

12:14 pm on Jul 29, 2005 (gmt 0)

Someone mentioned about a manual ban. If that was the case why is it happening on one day (July 28) on many sites?
In my opinion, a manual ban should be constant, (let's say Google employees ban 200-300 sites / day). This is not the case here. Everyone affected complaint about July 28 being the day when their sites disappeared from Google.

moftary

12:18 pm on Jul 29, 2005 (gmt 0)

Sorry, I was wrong... actually, it is not googlebot, it is Mediapartners-Google.

Frankly I had my doubts. That makes more logic now.

Are you sure the problem is witb term: "directory"?

Just a theory, refer to msg #45 in this thread.

Briefly, two of five directories I have got banned. The only semilarity between the two and in the same time only difference between them and the other unbanned three, is the term "directory" in meta keywords tag.

Again, whether my theory is true or not. It doesnt make a difference since gbot stopped dropping by so there is no way for it to index the updates. But then again, what to lose?

moftary

12:28 pm on Jul 29, 2005 (gmt 0)

Sites with a large number of outbound links in a list format (eg like Directories and Scrapers) - I would have thought that this includes links going through a redirect/cgi bin - G must be smart enough to work that out.

Maybe that's the case.

Sites with content virtually identical to another site - eg Datafeed sites with virtually no unique content, or Newsgroups with no unique content (very very thin pages)

Not my site. My site is simply a web directory, with unique submissions, and meta searching engine.

ODP clones

Google directory itself is an ODP clone. Also, I dont see excite, alexa or other major portals that clones ODP being banned.

BTW, I have asked this question hundred time with no answer. Is it just me or alexa is PR0 really?

I don't know why I feel that alexa being PR0 is connected to this thread subject.

[edited by: moftary at 12:34 pm (utc) on July 29, 2005]

lammert

12:31 pm on Jul 29, 2005 (gmt 0)

Someone mentioned about a manual ban. If that was the case why is it happening on one day (July 28) on many sites?

I was mentioning it. It could also be a semi-automatic ban. When looking at this and other threads:

Banned sites were often large
Banned sited used heavy linking, either because they were directories, or because of reciprocal links, or ODP cloning
Many people report heavy Google spidering just a few days before July 28.
All bans seem to have happened at (almost) the same moment

This could be caused by an off-line spam detection spider, quality spider or whatever you want to call it, which spiders suspected sites, checks the site according to internal rules (linking, duplicate content e.a.) and then pushes the BAN button if the site lacks a certain quality level. This is a totally different approach from the current Google SE algorithm which has tunable parameters which influence the position in the SERPS, but not the existence in the index.

moftary

12:37 pm on Jul 29, 2005 (gmt 0)

I think it's an automated ban with manual exclusion button. I.e. they automatically ban all sites that hit a specific factor, and they exclude some well knows sites (lycos, excite, ODP, yahoo, etc..).

JuniorOptimizer

12:46 pm on Jul 29, 2005 (gmt 0)

" sgsurvey, you mentioned that you're using hidden text, and that you're scraping content from other people and using it yourself. Either one of those could be contributing to your problems. From listening to feedback that the search engineers heard at the the last WebmasterWorld pubconference, I have a strong hunch that we're going to be taking a closer look at sites that are just scraper sites, or throwing up a copy of the ODP with no value added. So I wouldn't be surprised to see (for example) sites that are just scraping Google (or possibly other sites) not doing as well over time. "

Here's Googleguy's post, in the interest of accuracy. So we are to believe that GG does not understand what a scraper site is? Why would he have any trouble targeting the correct sites? They have a left-based menu which contains the keywords. Each menu item is a hyperlink to a page which contains "scraped results" which are snippets from websites, usually culled using the Google API. All AdSense Scraper Sites (A.S.S.) have 4 Google Adsense ads directly above the fold.

If Google does not like FindWhat SERPS in their SERPS, then use a FILTER. This would not bring a dealth penalty to the domain.

So, in reality, it appears that Google went after websites that are not Adsense Scrapers. Why the policy change? What policy? What do you do when you've been executed? Is there life after death? There are more questions than answers in this update. And you know what goes "up" when Google updates? Their revenue.

moftary

12:51 pm on Jul 29, 2005 (gmt 0)

I dont know why GG is not showing in this thread. Can someone kindly drive his attention?

zoltan

12:58 pm on Jul 29, 2005 (gmt 0)

I can only talk from my own experience. Yes, we do have ODP, although it is highly customized with some of our links. But I repeat: ODP is only about 10% of our entire website, it is just an addon.
The rest of the site is user submitted and we do not even post links to members' sites (except a few site sections).
"I have a strong hunch that we're going to be taking a closer look at sites that are just scraper sites, or throwing up a copy of the ODP with no value added."

If what we have done is considered "no value added" then I am lost. We still rank high on yahoo and msn only google dropped us completely.

And one more thing. I have another similar website from 2001 which is totally abandoned by us since we did launch the new site in 2002. Hence, this website ranks #1 for a very competitive term in the last 3 years! No matter what happened, it was not affected by search engine fluctuations at all! And, just to mention, I ftp to this site about once in 4-5 months just to do minor changes... (15 minutes work).

moftary

1:11 pm on Jul 29, 2005 (gmt 0)

zoltan, when google ban you they ban all your site and presumably all your subdomains as well.
That's the case here at least.

JuniorOptimizer

1:14 pm on Jul 29, 2005 (gmt 0)

Yes, your entire domain is found guilty, although un-accused, and all current future sub-domains, pages, etc. are also found guilty, without trial or recourse.

The Contractor

1:19 pm on Jul 29, 2005 (gmt 0)

Dayo_UK msg #:90

I have not seen any of the sites that have been hit so I can't really comment on why they have. I will give you my long-winded approach on what I would do and not do when building a niche directory. These are my opinions only!

Here are my do's and don'ts :

Do not build a niche directory on a topic that does not interest you or is built solely on high$ keywords. They are too much work to build correctly and if you are not interested in the topic they will become stale or crap, as you really don't care.

Seed the site yourself with sites you value. Do not scrape results from other directories or SE's. Set forth a clear set of guidelines for titles and descriptions and stick to them. Never let the submitter control the titles/descriptions or you end up with titles like keyword keyword keyword keyword and descriptions to match. Screams low quality. Do not let users modify listings or that is all you will spend your time on.

If you use an off-the-shelf script to maintain your directory (which you probably should), make sure you change all paths etc as to not identify yourself with that script.

Do not have one template that runs your whole site with the only thing changing is the call tags for titles, categories etc. Build the categories with at least a descriptive paragraph or so of text on the page which describes the category and the sites/listings a visitor may find in that category.

Have clear submission guidelines. Do not include sites you don't feel comfortable with whether they are paid inclusion or not.

Do not make a reciprocal link directory. It's a dead-end road imho.

Do not use all that ratings crap� the only ones who will use that are the site owners and it further identifies yourself as an automated directory whether you are or not.

Do link out without hiding behind some script/counter. You should not be afraid to link directly to any site that you have included. If you are, then that site shouldn't be in there.

Do build your site for visitors/users. If it's built for users, all the navigation, descriptive text, etc. will be in place for search engines.

Do not build for quantity, but instead for quality. I would rather have a 200 page quality directory that takes 6-months to build than a 50,000 page directory that is generated in 7-minutes with crap. Guess what, so would users�

If you are building a directory with only listings you have a tough road ahead of you to differentiate yourself from what's already been done many times before. I would build out content based upon the topic to supplement the listings.

Do not rent, sell, or buy ROS links.

Do not create a "shell" directory. By this I mean, don't create a bunch of categories that are empty. If you have nothing to put in a category, don't create it. They should be created only as needed.

Do not think of PR � ever! It will come eventually.

Do not include text that mentions PR, search engines, rankings, links, etc. unless that is the topic of your directory.

Do not build a network of related directories that are on the same topic.

Well, that's all I can think of off the top of my head. A directory is nothing more/less than a way to organize content, information, and resources. It's much like your folder structure you create on your own computer system to organize emails, documents etc. If you are one that can't organize your own information, you probably won't do very well categorizing an online directory.

The Contractor

1:26 pm on Jul 29, 2005 (gmt 0)

Is it just me or alexa is PR0 really?

Yes, I fired up the toolbar and Alexa's index page is PR0, although it's other pages have PR.

tigertom

1:32 pm on Jul 29, 2005 (gmt 0)

Sorry if this is off-topic:

Does anyone know for certain if a 301 redirect from a banned site would 'taint' the directed-to site?

Not something I think I'd try at the mo' but I'd like to know for certain. Thank you.

JuniorOptimizer

1:34 pm on Jul 29, 2005 (gmt 0)

That's a great guideline list and all. Too bad since it's not coming from Google, it's completely meaningless.

I'd be much more interested in a list like that from Google.

Dayo_UK

1:35 pm on Jul 29, 2005 (gmt 0)

Alexa being PR0 might be due to this:-

[google.com...]

or this - if you are hitting a different dc. Although it does look like all dcs at the mo.

[66.102.9.104...]

Other things are happening at Google at the moment - so I cant be sure if we are talking about a ban or not for the sites being discussed in this thread.

Dust is still in the air.

[edited by: Dayo_UK at 1:37 pm (utc) on July 29, 2005]

The Contractor

1:35 pm on Jul 29, 2005 (gmt 0)

Does anyone know for certain if a 301 redirect from a banned site would 'taint' the directed-to site?

Sure it would. A 301 tells Google or any other search engine that the domain/page has been moved permanently to the redirected site/page. No different than if you 301 a domain/page that has backlinks/PR that those will transfer.

Big_Gig

2:14 pm on Jul 29, 2005 (gmt 0)

# Sites with a large number of outbound links in a list format (eg like Directories and Scrapers) - I would have thought that this includes links going through a redirect/cgi bin - G must be smart enough to work that out.
# Sites with content virtually identical to another site - eg Datafeed sites with virtually no unique content, or Newsgroups with no unique content (very very thin pages)
# ODP clones
Ok - some of the side effects of the above - normal directories will get hit (even ones with unique user submitted listing - the user submitting probably does not vary the text to much between directories), aswell as sites which have a large number of seemingly outbound links as page content.

Dayo_UK - this is the most logical theory I've seen put forth in this discussion.

Let me put a second vote in for the following:
- large number of scraped/dup outbound links - including links being redirected, or opened up in frames
- sites with content virtually identical to another site
- sites with little modified content, consistant template, only switching out keywords

Let me say that the following are NOT the problem:
- purely directory sites (do a google search and find millions of directories still online)
- recip link directories (again, google it) If recip directories and link pages were being kicked off... practially the entire internet would have disappeared.

Does anyone have additions to, or problems with this list?

What are the pain thresholds for banning one site with these paramaters, but not another?

zoltan

4:04 pm on Jul 29, 2005 (gmt 0)

"zoltan, when google ban you they ban all your site and presumably all your subdomains as well.
That's the case here at least."

Moftary, I only used the www, no subdomain used for my domain.

Petrocelli

4:08 pm on Jul 29, 2005 (gmt 0)

>> I only used the www, no subdomain used for my domain.

"www" is a subdomain in technical terms ...

Rx Recruiters

4:13 pm on Jul 29, 2005 (gmt 0)

As I have posted on other threads, I too, was deleted from the the index on July 28. Not a scraper site, original content since 1998, and 400 pages - all now gone.

I guess this signals the end of an era - the other sites in the test searchs I run are all corporate mega-sites, spending millions on advertising and SEO. I have seen other people post this - but the end of the smaller, 3 or 4 person (or less) business sites - relevent sites prospering - what the public searches for in organic SERPs (or PPC) - is ending.

If I wanted to go to a corporate mega-site, I just use the comapny name, if I want to find a unique site, I use a search engine - but now, the megasites are ruling the serach engines as well.

Good luck to all, I am e-mailing Google with a re-inclusion request, but unless they do a manual site inspection, I think my requests will go unanswered : (

The Contractor

4:34 pm on Jul 29, 2005 (gmt 0)

zoltan stickied me his site and his problem is not a penalized problem, but with his hosting and the script he used. Tried almost a dozen times before I could connect. It will not connect at all without the www. Even after connecting with the www using a server header checker it still sends a 302. Just wanted to mention this as not all site drop problems in Google involve a penalty (I hope you don't mind zoltan).

webdude

4:44 pm on Jul 29, 2005 (gmt 0)

I am sorry to buck everything that has been commented on thus far, but I have been having no problems with my directory. Of course this directory is part of a much larger site, but it is a directory non-the-less...

1. It has links to it called "mytopic directory" (2 words). It ranks in the first 5 for that key phrase.

2. The title of the page is "mytopic directory and resources."

3. All incoming links have "directory" as the link text.

4. It is very Yahoo like with categories and sub categories. Users submit their sites and are included after a review (to make sure it is on-topic). They choose their own categories and their categories are changed if there is a better category, much like DMOZ.

5. The whole thing runs on a template and database with all metatags, pages/ directories and other info being loaded.

6. I ask for a reciprocal link, but it is not neccessary to get into the directory.

7. I link directly out.

Things that do not buck what is being discussed would be...

1. The scripts that run the directory were written by me - no off the shelf stuff.

2. No adsense on the site.

3. No cgi-bin or those type directories.

4. No ratings.

5. I do not not rent, sell, or buy ROS links.

webdude

4:48 pm on Jul 29, 2005 (gmt 0)

I forgot to mention that the site is non-profit.

JuniorOptimizer

4:53 pm on Jul 29, 2005 (gmt 0)

So, what happened to the scrapers that were targeted? I've seen a normal amount of referals today from scrapers.

Has anyone noticed a huge reduction in scrapers?

The Contractor

4:55 pm on Jul 29, 2005 (gmt 0)

After a closer look...besides hosting/script problems...there very well could be a ban on zoltan's site... it's easy to see why...

Andem

4:56 pm on Jul 29, 2005 (gmt 0)

I am totally freaking out, just for the record.

But, I am keeping a close eye on my access log (see: tail -n) and I am getting a couple of Google referrals every now and then, like 5 times per hour. From other sites though, like google.de. I goto the referring page and am no where to be found.

I am also seeing Googlebot looking at some pages but not often. IE:

66.249.65.78 - - [29/Jul/2005:12:41:09 -0400] "GET /forums/forum-6.html HTTP/1.1" 200 79634 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Note that it is right now 12:54:00 on my server so it wasn't long ago since Google looked. I am going to convince myself that this is a good sign so that I can sleep tonight.

I sincerly wish all clean webmasters best of luck in getting through this. I have never had anything like this happen in the years I've been on google.

zoltan

5:16 pm on Jul 29, 2005 (gmt 0)

The Contractor is talking about a possible Duplicate Content penalty. Yes, we have more than one site but they are running by different persons from different countries. They share the same script and database (with certain targetting) but they are ran by different people.
If the problem is indeed duplicate content, the question is: why the busiest and oldest site is under penalty?

The Contractor

5:21 pm on Jul 29, 2005 (gmt 0)

You cannot run 7 or 8 duplicate sites and expect them to all rank can you? I found those simply by looking at all the sites hosted on that IP. They all return errors and are unreachable often, they also all return 302's when redirecting to the homepage you have setup. Google has every reason in the world to ban sites like these. I know that may sound harsh, but why would they want to list 8 sites that all have the same content?

zoltan

5:28 pm on Jul 29, 2005 (gmt 0)

I did not created the sites for Google. I do not expect Google to rank all my sites.
The problem is that the busiest and oldest site is under penalty. Not the others that were not even ranked well (or at all) on google.

This 588 message thread spans 20 pages: 588