Forum Moderators: Robert Charlton & goodroi
Anyway, for a few months traffic steadily grew as Google indexed these 120,000 pages, and then it carried on growing as those pages floated up the SERPs a bit. Until 3 days ago we were getting about 1500-2000 hits a day from Google. Not a massive amount I know, but a crapload better than the 100 hits a day we've had for the last 3 days...
Have we been "sandboxed"? If it's of any note this sudden drop has occurred almost exactly 6 months after I submitted the first sitemap to google.
Personally I think the key problem is that all bar 500 of our pages (if I do a site:www.domain.com search) appear to be in the supplemental index. I don't know if that's what being "sandboxed" means though.
I fear that this is because most of our pages are effectively "empty". They only contain the company name, address, a map, phone/website etc, and an empty graph. Companies that have been rated have < a lot more content >.
Is Google basically seeing these pages as being too "thin" and killing their ranking because of it? Or is it just a sandbox issue, and either way - what should we do!?
Any advice/insight much appreciate!
Many thanks guys (and girls)
[edited by: tedster at 3:00 am (utc) on Mar. 29, 2008]
That said, your guess sounds pretty good. You are describing what Google reps have called "stubs" - and these types of pages are not what Google wants to offer to their end users. The following thread talks about comments made by Adam Lasnik of Google. Even though it's from 2006, nothing has changed in this area except that Google seems to be getting better at spotting stubs and not ranking them:
[webmasterworld.com...]
Any advice/insight much appreciate!
There is a very strange Google flux occurring right now!
Do nothing, wait...if it's ANY consolation I have 15 yrs old sites being hamered and I have no idea why!
Patience is required right now.
Two other things:
1) 3-4 weeks ago we added "noarchive" to all pages. This was after a company asked us to remove libellous comments that users had left, and we felt that the google cache doesn't benefit us, and in the case of content removal we can get stuff offline quicker without it.
Just been reading up on that and I think the consensus is that this SHOULDN'T be a problem...
2) Also about 3-4 weeks ago, we did a tie-in with a price comparison site. They do the prices, we do the ratings for their companies. This meant that overnight about 10,000 links appeared from their site to our site. The links went to a routing page on their site and then 302-redirected to our site. I think this could have triggered alarm bells at google. I've got that site to add rel="nofollow" to all of those links, and we've submitted a reconsideration request to google based on thinking that that might be the problem.
Basically I hope it's anything other than the stub problem! Only about 1% of our pages have any ratings/comments left by customers on them at the moment, but the other 99% of pages bring a vast percentage of our traffic in, and a lot of those people then discover what the site is about and start using it. If we had to get rid of the stubs, or live with them ranking poorly, well... it's not going to help!
that maybe a problem...depending on how unique those pages are.
I was under the impression that the "noarchive" tag simply removes the "cache" link from SERPs. I'm fairly sure it shouldn't affect ranking etc... can someone clarify?
That's exactly right - it just means that Google does not serve a cached version of the URL - something I keep in mind whenever I work with a limited time sale price. Here's a quote from our archives,with an interesting extra detail about cloaking and scrutiny:
Googleguy:
This tag does not have any effect on ranking. Be aware that it may open your page to greater scrutiny however (the initial checks we've done show that many people use the noarchive tag to try to cloak etc.). If you're doing something like cloaking, the noarchive tag makes it look more deliberate to us.NOARCHIVE for Googlebot [webmasterworld.com]
and to your most valuable / content loaded / relevant pages?
( with the few ratings you actually DO have )
... whee ...
I suppose you *haven't* checked just yet whether any of those redirects (took over and/or) destroyed your listings?
Perhaps it's not that evident, but could have broken up the integrity of your site's ratings in the index.
I know it's 2008 but reports of accidental hijacks still pouring in... the infamous inter-domain-temporary-redirect is still one of the worst ideas around. Except if they ( the price-site ) has these pages ( the 302 redirecting pages the links lead to ) disallowed in robots.txt. But even then there're some risks.
Have those links made direct but put in javascript...
...instead of nofollowed/302 redirected.
[edited by: Miamacs at 12:34 pm (utc) on Mar. 30, 2008]
Content.
If they have no content, why would you expect them to rank high?
It's not just that the pages are 'thin' - to Google, they will be virtually identical.
If you compare code, you'll find that after the ads, navigation, logos, etc., the percentage of 'unique content' will be very small indeed, so taken site wide, you have a bad attack of sick site syndrome.
Why would Google want to index such pages?
The lesson is, if you want to take advantage of Google as a referrer, you need to give in return.
If Google is important to you, rethink your content policy:
1. Minimise code bloat - avoid excessive repetition of navigation and 'marketing' logos, slogans, etc.
2. Consider deleting 'empty' pages - and merging content to create fewer, better, pages.
3. For the future, avoid creation of further thin pages; go for quality content, not quantities of pages.
4. If you use meta descriptions / keywords - be sure these are also unique
5. Plus titles should be unique, of course ;)
6. Check that your page mass producer is not also chucking out millions of alternate URLs - that alone will guarantee sick site syndrome.
Using software to substitute for human publishing appears much quicker and easier - but not only does it often produce the problems I've highlighted (and 46+ more), it often produces pages that are not human-friendly, even if the search engines survive their indigestion.
Check your 'bounce' rate - you may find most visitors are leaving rather than having to click endlessly to find significant content.
[edited by: Quadrille at 1:00 pm (utc) on Mar. 30, 2008]
It's interesting to me that this "purge" only now happened. Has Google tweaked something in the duplicate/thin page detection area? Maybe this is at least part of the current "flux" that many are reporting in the March SERPs Changes [webmasterworld.com] thread. It has been a few days since I saw a Wikipedia stub page in the results.
But it can also be that -- however the heck G's back end actually works -- it's just now sussed out the possible stub problem. Throw in the bit of flux going on and it makes it more than difficult to pin down any one factor.
The nofollow on the 302s might help. However, as these links have already been indexed without the nofollow do we really know how they'll be handled going forward? One thing we do know is that G never forgets; as such I think I'd try to wash these links away. Maybe set up a "/partner" directory for the other site to link to and disallow it in robots.txt.
And, of course, at the same time work through Q's 6 steps to no stubs.
Promote your site in other ways besides depending on Google, and as more users submit reviews and more pages gain real content Google's opinion will improve
We're working on it! :)
I suppose you *haven't* checked just yet whether any of those redirects (took over and/or) destroyed your listings?
Luckily they only link to 800 of our listings (but the most important ones of course). A few are showing in google under their site now, unfortunately, but only a handful. I worry the others are in the pipeline though and just not showing yet.
They were already using a "/ourdomain" folder and that is now in their robots.txt - hopefully a combination of that, the rel=nofollow attribute and some manual URL removals will fix this particular problem...
...but even then there're some risks.
Care to elaborate?
After that it's back to worrying about stubs.
Which, to be fair, we will have to ignore for a couple of weeks at least. For all we know everything will fix itself in the meantime now we've (hopefully?) quashed the 302 issue, and as someone else pointed out the algorithm appears to be undergoing changes at the moment.
If they have no content, why would you expect them to rank high?
No expectations, but they WERE ranking ok, and now they're not ranking *at all* :(
Quadrille's points: (not being defensive, just clarifying :))
1) Code itself is pretty streamline. There is a sidebar of content that appears on *every* page. If I could tell googlebot "don't index this part" I would, but as far as I know I can't, so the solution seems to be to move the sidebar into a robots.txt-blocked iframe?
2) Not sure that's going to be an option unfortunately
3) Future pages only get added once someone submits a review alongside a company we currently dont have.
4) They are unique, but only in company name: "blah blah blah blah blah **CompanyNameHere** blah blah" - the blah's stay the same on every page...
5) Of course, currently "CompanyName (Sector): OurDomain.com"
6) only one indexable URL per page
Bounce rate: That's the only upside so far, bounce rate HAS dropped, but nowhere near enough to compensate for the traffic drop.
Luckily they only link to 800 of our listings (but the most important ones of course). A few are showing in google under their site now, unfortunately, but only a handful. I worry the others are in the pipeline though and just not showing yet.
They were already using a "/ourdomain" folder and that is now in their robots.txt - hopefully a combination of that, the rel=nofollow attribute and some manual URL removals will fix this particular problem...
These links have already been found and followed, the linked to pages have been discovered and, as you have seen, the 302 "hijack" effect is starting to take hold. As we simply do not know how Google will treat these links going forward, I again think it's best that you get completely rid of these links and start anew.
i would consider noindexing the thin pages and start an adwords campaign to get the cheapest paid clicks you can to help you get traffic to those pages and to flesh out the site.
adjust your campaigns and (no)indexing according to content.
to get some organic traffic to the thin pages perhaps you could consolidate some of the company information so you have a landing page for each sector. (cf Quadrille #2)
arguably useful and not stubby and providing a way for company names and some keywords to get indexed.
AussieMike, interesting, especially since your site is much older and a lot "fatter"... how do I determine for sure whether we have a -950 penalty? Why did you get a -950?
Again, thanks for all of the help people!
The small amount of info you have on there can be readily found other places.
You think your site deserves to do well why?
(Not attacking you - just trying to get you think about what you are doing...)
...but even then there're some risks.Care to elaborate?
These links have already been found and followed, the linked to pages have been discovered and, as you have seen, the 302 "hijack" effect is starting to take hold.
as per jim said... robots.txt will only disallow the further crawling of the pages that do the redirects. If they've been discovered already, or if anyone at any given time links to them ( yeah why not, a simple right-click-copy-URL on the link will do this every time ) they'll get another check.
while linking out thru disallowed 302 redirs might be safer for the source, it's not the best for the target ... in my opinion at least.
...
but it seems you have piling troubles and this might have been the last straw. your best bet is to correct every problem, one at a time.
or abandon the whole site altogether... which might not be an option.