|Directory-like site SEO|
Hi, everybody. This is my first post. I have written very long story, but then realized that nobody will read it and decided to be more specific. If somebody shows interest, I’ll post the original story later in this thread.
My site is a kind of hierarchical directory/encyclopedia of geo objects. Majority of pages are ‘thin’ (title, a few words and numbers, static image map). Some are ‘rich’ – article, photos, etc. In addition each geo object has separate ‘map’ page (title and customized google map). ‘Rich’ pages are only 1-2% of total pages number. ‘Rich’ pages are easily accessible (2-4 clicks from main page). I keep adding new photos and articles on regular basis (2-3 articles and 20-30 photos per week). Each page has three language versions.
I see no explicit signals that my site is considered to be ‘low quality’. Googlebot indexed all pages (270.000 total) and re-indexes (3-5.000 per day). All pages receive some organic search traffic, including ‘thin’ and ‘map’ pages. I have never experienced sudden changes of traffic. I have never received messages from Google.
But the site got stuck. Though I keep adding new unique content, and it is indexed, my ‘rich’ pages get negligibly small traffic. Overall site traffic is just 2.000-3.000 hits per day (40% of this – from Google organic search)...
Here are some questions:
1. Provided that everything else is done properly, could killing (nofollow, noindex) my ‘thin’ and ‘map’ pages (and thus reducing indexed pages volume to 10% of original amount!) increase overall rankings and draw more traffic to high-quality pages? Sometimes it seems to me that pages of my site are competing one with the other?
2. Can having 3 language variants of each page and automatic language redirect on main page hurt overall rankings?
3. Maybe some other suggestions?
Thanks everybody in advance for advice and suggestions, and please excuse me for my poor English.
Here are a few questions to get started:
|each geo object has separate ‘map’ page (title and customized google map). |
Why are the maps on a separate page? It seems to me that it would be more user-friendly to have all the information about a subject together on one page.
|Overall site traffic is just 2.000-3.000 hits per day |
If as you say, you have 270,000 pages, then the vast majority apparently get no visits at all on any given day. Is this correct?
Also, is most of your traffic coming from long-tail searches, or have you targeted any particular search terms?
My 'info' pages contain small, static image maps which link to separate page. I have created sophisticated, heavily customized map, with many controls, filters, search box, clustering, etc. When 'map' page is shown, it is essentially the same map for each object, just positioned, shows marker, and has separate URL and title. And such 'maps' do get some traffic, when users search for 'object name map' or just 'object name'. Sometimes 'map' page is shown in search results, despite of the fact there is good, rich 'info' page, or even both 'info' and 'map' pages are shown simultaneously in SERPs. This is simply bad thing...
Yes, you are correct. Most of keywords are long-tail. Majority of pages get no visits any given day. I do not see any pattern here. Sometimes those long-tail keyword pages rank surprisingly high. Sometimes even moderate competitiveness keywords pages from my domain are shown in top 10. But unfortunately, just sometimes.
But many smaller, younger, more simple sites ranks are higher for these search terms, and moreover, have higher total traffic. To be specific, my keywords are just city names and local sights names. These keywords are not of top competitiveness.
I haven't performed any profound optimization or researches concerning to keywords, keyphrases, frequencies, etc. I just publish plain natural language articles, with in-text links to other pages, and photos. Photo titles have keywords included too.
|To be specific, my keywords are just city names and local sights names. |
I don't know if you've personally visited all of these places or not. In any case, in order to be really successful, you usually need to provide unique information that isn't available anywhere else on the web. Just having thousands of pages shouldn't be the goal. It's the amount and quality of the information that counts. Oftentimes SEO isn't the main problem.
I completely agree. I visit new places often and regularly add new photos, several people are working on articles. Those thousands of 'thin' pages have been added long time ago.
But the main question is: whether noindexing 90% of 'thin' pages will/could improve rankings of other, 'rich' pages. Or, in other words, average site 'quality' will be improved. I believe this question is very interesting and not just for my site, but also in general.
If nobody has experince with such experiments, I tend to do it on my own risk :)
I tend to noindex or block thin content URLs. I like to think in terms of percentages and if my percentage of thin content significantly outweighs my percentage of rich content (or even approaches the same percentage), it's time to get out the scalpel.
|I tend to noindex or block thin content URLs. |
Netmeg, how do you decide whether to noindex versus block in robots.txt?
Netmeg, thank you for advice. This is what I am planning to do, but I want to collect some statistics before. To realize what percentage of visits do those 'thin' pages generate.
aakk9999, I think it is better to block in robots.txt: less error-prone, and googlebot doesn't even load those pages (performance gain).
One exception is, when you are going to make pages "noindex, follow".
The second exception is, when you simply can not use robots.txt. This is my case. My thin pages and rich pages have same style of URLs, so robots.txt won't help. But I will use rel="nofollow" to disallow googlebot access, in addition to noindex.
Yes, there is plus and minus for either, I was just wandering what netmeg's thinking behind is.
Noindex circulates PR, roboted out pages do not. On the other hand roboted out pages save crawl budget (as you pointed out) and also, if there are linking only from within a certain URL structure that is blocked, for many Google may not know they exist at all.
For example I would never block category listing pages (they would be noindex) but I may block thin product pages.
|But I will use rel="nofollow" to disallow googlebot access, i |
This will not disallow googlebot access as there may be other links to that page outside your site (where your "noindex" kicks in). Since you are already noindexing the page, I would remove rel="nofollow" to allow PR to circulate.
If I really and truly want to make sure it stays out of the index, then I NOINDEX. So most thin pages would use that directive.
I would never use NOFOLLOW for anything on one of my own sites (except affiliate links and direct ads)
Mostly I use robots.txt to block crawling of URL parameters or paths to duplicate content, and admin stuff. There's no guarantee anything blocked in robots.txt won't be indexed and even ranking, and I hate that fugly snippet Google uses (Bing's is even worse) so - no.
That's how I do it on my own sites and for clients.
Hmmm, this is interesting!
Saving crawl budget is a good thing, no doubts
Yes, if noindex page is crawled, it allows PR to circulate. But is it necessarily good? Let's examine very simplified example. Which variant is better in terms of rich page PR? I guess, variant B is better, am I correct?
Here is link to the picture, since I can not upload it here:
When you have little content, instead of focusing on a very granular level (say at the town or even district of a town), maybe you could focus at the county level.
I used to have a directory for various institutions here in the US.
If I only had the info for a handful of institutions, I created a state-level page.
If I had more info, then I might have created two or three county level pages.
In cities that had a fair number of those institutions (like three or more), I would create a city-level page.
The gist of it is, no matter how well it scales, I don't think I would do a global "all cities listing whether they have info or not." I just don't how fun that is for a visitor to see a big list of cities and then a 0 by each one signifying no entries there.
In the diagrams you provided, I don't believe that there is any significant difference, because in diagram B, the target page is NOT linking out to any other pages. Hence, the way you have drawn it, it is not flowing page rank.
In essence, A and B are the same. In A, any page rank is "lost" because it is flowed to pages that are designated as noindex. In B, it doesn't flow any page rank.
Note that there isn't a page rank "leakage" in figure A. The trget page in yellow DOESN'T lose its OWN page rank by leaking out to noindex pages.
It just wastes whatever potential it had to flow page rank to other pages (since it isn't currently linking to any indexable pages).