Forum Moderators: open
I have a very reputable site that is ranked very highly by Google. The site's been around since 1999 and was grandfathered into Yahoo!'s directory. However, it is penalized and not showing up in the index. Even if you type the name of the site in Yahoo search it never shows up.
We do 3 main things:
game cheats
game walkthroughs
inhouse news (uniquely written by us)
inhouse articles (uniquely written by us)
aggregate of reviews from other sites
product database
Using the Yahoo Site Explorer I can see that only the homepage is in the index, even with 50,000+ inlinks.
I tried submitting the site for review the other day and I got a response stating that:
[edit]It did not meed quality guidelines[/edit]
The email lists the most common problems, which I believe we are not guilty of:
- Cloaking (showing crawlers deceptive content about a site)
We show the exact same content no matter who is on the site, except when someone is logged in they see a member bar.
- Massive domain interlinking- Use of affiliate programs without the
addition of substantial unique content
We do link to one or two other domains, one of our own. One of those domains is a link in the footer of the entire site. And we have links to news posted by that other domain. And in the forums we have links to multiple other sites of ours in the footer. Plus, as a review aggregator we link to many sites (over tens of thousands of links) directly to their reviews. We do not ask for reciprocal links for these outbound links. In such capacity we are no different than a site like RottenTomatoes.com (except we focus on games and pc stuff).
- Use of reciprocal link programs (aka "link farms")
We don't participate in link farms, and don't exchange links.
- Hidden text
No hidden text that I can find.
- Excessive keyword repetition
No keywords are repeated purposely for SEO. We do list the keywords where appropriate, but I don't feel they are excessive. Certainly nothing unnatural is being done to repeat any keywords beyond what normal text should contain.
As far as I know we don't have excessive content duplication, and we never quote large chunks of text from other sites in our own inhouse content. We do quote small quotes in our aggregate link pages, but again far fewer than a site like [SNIP].
Anyway in closing we aren't doing anything intentionally wrong, and certainly we don't do anything wrong as per the site quality guidelines that I'm aware of, unless someone can tell me otherwise based on what I've written.
I don't suppose there is a tool somewhere, or else another more obvious way of figuring out why a site would fail a review?
[edited by: martinibuster at 12:41 am (utc) on Jan. 30, 2008]
[edit reason] Removed email quote, and refs to other sites. See TOS. [/edit]
I consider that part of the site more like a directory of reviews on other sites. Does Yahoo look unfavourably on directories? What if the Directory aspect is only one small part of the site as a whole?
I know the statement "but other people do it too" isn't a valid argument in a situation like this but not only are we one of the original aggregates of such content (starting since 1999), it is only a fraction of what we do, while other sites that are being indexed by Yahoo rely on the aggregated content as > 50% of their content.
We were never penalized by Yahoo until maybe 2 years ago. I suspect it was because someone pointed their domain to our site by accident and we displayed our pages under that person's domain (we have since added 301 forwards to disallow any domains but our own from showing our content).
We do link to one or two other domains, one of our own. One of those domains is a link in the footer of the entire site. And we have links to news posted by that other domain. And in the forums we have links to multiple other sites of ours in the footer.
For instance, a site that may be considered thin by some is the kind built in a directory style to exploit taxonomic similarities so that you can build out thousands of pages whose difference in content from one page to the next is superficial. Typical are pages whose differences lie in the names of cities or states swapped in and out of the "original" content, that kind of thing.
Writing a sentence or two of content then switching out the names and model numbers of products does not constitute original content.
However, out of say 1.4million pages at least 250,000 would be pages for products that have the aggregate of review links on them with little other original content. This is not done to intentionally mislead anyone, its just the page dedicated to the reviews of that product.
So the other 1.1 million pages of content is original? That's great. How many writers did it take to write 1.1 million pages of game and gaming product content? How long is each piece?
Or is this original content basically a template of original content with product names swapped in and out?
...while other sites...
It's a good idea to not look at or compare your site to what others are doing and getting away with. Never. You're either fat or your not, doesn't matter how much fatter the guy standing next to you is. Judging by comparison can become an exercise in self-deception, a way to justify cutting corners. Don't go there.
[edited by: martinibuster at 8:50 am (utc) on Jan. 30, 2008]
I begin to see the problem you guys are talking about.
I'd like to breakdown each situation:
1) Out of 1 million pages perhaps as much as 50% of it is forum content. So if we discount that 50% let's look at the 500,000 other possible pages.
a) We have 43,000 products the majority of which have descriptions written by humans. Those descriptions are not hugely detailed, but are written either by ourselves or by volunteers. Most of it should be free of copying from other sites since we check for that. However, for computer hardware they will appear formulaic since the description focusses around their technical features. Regardless of how cookie cutter the descriptions sound they are either hand written, or are specs that cannot be rewritten to be less robotic. Again, none of it is cookie cutter per se, it is almost certainly hand edited in some way. HOWEVER, each piece can range from only 50 words to something like 250 words. Again, not huge writeups. And certainly there are many products with no writeup at all.
Of the 43,000 products many will have additional pages of content like user reviews (which can take up around 1-10 pages per product depending on the number of reviews), cheats (takes up 1 page each product), tips (1 page per product), walkthroughs (a page of links to text files we host), and etc. Much of this content is user generated, but goes through an approval system run by a handful of human editors. My guess is this accounts for 200,000-300,000 pages of content.
In addition to this we have around 20,000 pages of 100% completely written content (written from scratch or nearly from scratch), in the form of news, and articles. There are 30,000 images that we display for articles as thumbnails, clicking on a thumbnail opens up the image in a unique page showing a 1024x768 version of the image. I think 80% of those images are completely unique to our site since we took them using our camera.
We also have 21,000 user submitted scans of game or movie boxshots, each scan has its own large image page, plus 260,000 game or product screenshots as single images (the images are 640x480 or bigger).
So while I understand that a very small portion of our content is actually written 100% from scratch (I would consider user reviews, cheats, and our own articles to be the case) it does account for a significant amount of the overall content. Not to mention the completely unique images, user submitted artwork, and forum content which is not duplicated elsewhere on the net.
On the other hand, there is infact alot of pages that I would consider to be rather thin, since they might not have enough information to be considered a robust page. Just like stubs on a wiki or something.
Or Perhaps Y! doesn't like permalinks to large versions of images. If that was the case we'd be in big trouble.
2) Inlinks: do they look shady?
I took a look at 20 pages of inlinks from Site Explorer. Total inlinks are 51,000+. Out of the sample of 20 pages I do notice a lot of repetition by sites that link to us (we do not own these sites nor do we have a link exchange with them, however we probably have links to their reviews within our directory). 2 pages in particular have links that we never link to at all. I do however notice at least 10+ links from sites that we do own.
Independent sites that link to us do so completely naturally. We send them news about our pages and they choose to link or they do not. We do not have any special link exchange farm deal with anybody, and in fact we don't even have a link exchange page (we refuse to trade links of homepages, we only link to direct content such as reviews). However, the industry we're in tends to do a lot of linking of each others articles. The way we do it is we link to any article of quality that relates to our product pages, regardless of whether they link to us. If this isn't a legitimate use of outbound links I'm not sure what is!
Should I remove links to sites we own from the footer of our pages? Those are there because of the relevance of the links. For instance our main site is about computer hardware reviews and games. One of the sites we link to is a Gaming Social News site, and the other site we link to is a site revolving around overclocking and reviews (this latter site is run by a completely different set of staff that create content completely unique to the site and with no duplication of articles from our main site).
Are we forced to remove such links because Yahoo! considers them link scams? I'm merely trying to link to related sites that we run so we can help our users find other related sites to ours that we own.
3) Comparing to others and cutting corners
It's a good idea to not look at or compare your site to what others are doing and getting away with. Never. You're either fat or your not, doesn't matter how much fatter the guy standing next to you is. Judging by comparison can become an exercise in self-deception, a way to justify cutting corners. Don't go there.
Yes I understand this point. But I raised my issue not because I want to complain about what people are "getting away with". I don't want to "get away" with anything and I don't think they are getting away with anything either. I believe we have honest, legitimate content. I also believe we have lots of pages that are not quality pages. NONE of those pages are there intentionally to fool the search engines. I believe the good content outweighs the bad content. I also fear that Y! is considering our site unfit due to the kind of site it is, yet clearly out of all the sites of our type we're one of the best quality and we have tons of original, hand written content that is unique. It's got nothing to do with getting away with anything or cutting any corners. We've probably made mistakes, and I admit there's alot more to be fixed, but to be completely dropped from the index is a penalty of the highest order.