homepage Welcome to WebmasterWorld Guest from 54.227.11.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Large-Scale Database Sites - How to SEO for 1,000,000+ page?
jeremymgp




msg:4118923
 5:03 pm on Apr 20, 2010 (gmt 0)

Hello,

Please could WW-ers give their thoughts on onsite and offsite strategies for large-scale sites with 1,000,000+ pages.

The site in mind is an aggregator site, searching huge numbers of classifieds listings from selected sites and putting them into a database.

Onsite SEO:
- Generate the most possible number of indexable URLs, competitor sites have huge content so the more pages the better including but not limited to all permutations of country, state, city, post title, company and similar variable.
- Use the simplest possible URL structure with ideally no dynamic URLs
- Make the database both as large as possible, and the site as functional as possible so users can find small numbers of highly targeted results. For aggregator sites it's all about finding the postings you want and getting word-of-mouth referrals.

Offsite SEO:
- Do typical link building strategies apply? It's possible to build links to the front page and selected keyword pages for particularly relevant/profitable keyphrases, but the sheer number of long-tail searches makes me wonder where to start. Competitor sites don't seem to be doing SEO as such at all, rather they just concentrate in building a killer database and site usability and word-of-mouth referrals, links and large numbers of organic searhc results grow naturally.

Any and all thoughts you have for particular strategies to employ for SEOing a large aggregator site, please write here!

Thanks,
Jeremy

 

tedster




msg:4119241
 6:24 am on Apr 21, 2010 (gmt 0)

On-site, I'd also consider a strong "themed" architecture [webmasterworld.com], with little to no cross-linking between the individual. Internal linking should be in the vertical direction.

Also avoid the thin or "stump" pages that plague large database sites. You're going to need a lot of backlinks to get Google to index anything near to 1,000,000 URLs anyway, so don't start out with your hopes that big.

For link building, I'd aim to create highly useful, and even novel, content for the main "hub" pages at the top of the themes and sub-themes.

gn_wendy




msg:4119248
 7:06 am on Apr 21, 2010 (gmt 0)

I do SEO for a site even larger. Getting the longtail indexed is my main priority... ranking for shorthead pages and competitive keywords is my secondary initiative. However - I am far more successful at that.

Flowing link juice from the more powerful pages to the lower pages is one of the main issues. I can't tell you what the best way to go is. I have three different "systems" set up and don't know myself which works best. Recently did a relaunch and I am now waiting for that to stabilize. I hope to have some conclusive data within the next three to six months.

One tip I will give you - which I have been able to solidly confirm - is that if you need more than 5 clicks from your main page to the "last" page you won't get those pages in the index. I try never to go further than 4 clicks out. Finding the ideal solution to categorizing and spreading your content to achieve that - if you have that many pages - and following the "no more than 100 links"-rule (in order to not dilute link power too much) is going to be a key challenge.

pavlovapete




msg:4119829
 4:41 am on Apr 22, 2010 (gmt 0)

Interesting discussion. Please allow me to throw some basic numbers in.

1 homepage, 100 links
- 1 click
100 X 100 = 10,000 pages
- 2 click
10000 X 100 = 1,000,000 pages

1 homepage, 30 links
- 1 click
30 X 30 = 900 pages
- 2 click
900 X 30 = 27,000 pages
- 3 clicks
27,000 X 30 = 810,000 pages

1 homepage, 20 links
- 1 click
20 X 20 = 400
- 2 click
400 X 20 = 8,000
- 3 click
8,000 X 20 = 160,000
- 4 click
160,000 X 20 = 3,200,000

Personally I think 100 links per page is way too many. And in addition to the links to sub-pages we are going to need (probably) an equal number of "related links" which we'll use to pass juice to other pages in the site.

If we use the Theme example, and I understand tedster correctly we won't be cross-linking across the themes.

if we use a range of 20-30 links per page then we are looking at anywhere between 900-160,000 first and second-level "hub" pages (which need sufficient attention to exhibit "highly useful and novel" content)

IMO another relevant issue is "How many users are actually going to be navigating into the content?" 4 clicks is not a lot of effort - but scanning and discarding several hundred links on the way is surely burning a lot of mental calories.
I wonder if search is the primary navigation method used by users to get around, and get to, content in these big sites?

tedster




msg:4119832
 4:45 am on Apr 22, 2010 (gmt 0)

I wonder if search is the primary navigation method used by users to get around


I'd say yes - most definitely. In addition to search traffic from Google or other engines, having an excellent site search function would be a primary asset for you.

walkman




msg:4119834
 4:53 am on Apr 22, 2010 (gmt 0)

I wonder if search is the primary navigation method used by users to get around


Maybe they're just hoping for SE traffic and that's almost it? 1 million pages but probably from a database

Lexur




msg:4119839
 5:18 am on Apr 22, 2010 (gmt 0)

...to have some conclusive data within the next three to six months.


We've back to the Altavista age.

gn_wendy




msg:4119871
 7:30 am on Apr 22, 2010 (gmt 0)

..to have some conclusive data within the next three to six months.
We've back to the Altavista age.


The site I work with is HUGE. I use different systems for different subdomains in order to test what works best. We just did a relaunch. For G'-bot to come full circle it takes about 2 months if I have a high crawlrate and longer if not. After that time I can start analyzing the data, which again, there is a lot of ;)
But I'm in it for the long haul, so I don't mind the wait. I have enough on my plate anyway.

Personally I think 100 links per page is way too many.


I agree... the fewer links you can have on a page, and the fewer hubs you can get away with the better. It depends on the structure and what type of content you have. For us I have been limited to (on the second tier) 30 pages. Those 30 pages link to between 20 and 200 from each page. I wanted to do it very differently, but that doesn't work for the site or "usability". Which brings me to:

How many users are actually going to be navigating into the content?

I wonder if search is the primary navigation method used by users to get around


I would say that the linking structure is for google to find and index pages -- users will then use google to find what they are looking for. At least that is how I optimize the page I work on. Users can search on our page as well, but most people are 1-time visitors with 1 or at the most 4 pageviews.
That said: I do a lot of SEO for the site -- and most of what I do is to position pages better in the rankings. As you may have figured the site lives off of traffic from G'. Since the site is heavily optimized we have decided to keep the site as "clean" as possible. That is we only do things that "would" or "could" make sense for a user so that we have a fallback in case the site is up for manual review. Call me paranoid, but I like to cover all my bases.

@jeremymgp

at the end of the day it comes down to what works best for you. You will need links to the main site. You will also need a lot of links to each category/theme/hub page. You are also going to need a fair amount of deep links to signal to google that the content on the deep pages also are of value.
The thinner you spread your site -- and the larger your site -- the more links you are going to need to the respective areas. This is something that I would keep in mind when creating the structure. I would refrain from going with a structure that people would not find any use in linking to (for example and HTML-sitemap-type links page). You can have those in between, but I would also have some useful and linkable content in there.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved