homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 40 message thread spans 2 pages: 40 ( [1] 2 > >     
Adding 25,000 Pages to a 3,600 Page Site - bad idea?

 5:54 pm on Feb 23, 2011 (gmt 0)

Hi Everyone!

Please bare with me as I am somewhat new to the SEO game. I have a quick question about adding some new pages to my site. I run a real estate site and we recently expanded our coverage areas as our SEO is starting to really come together. The site currently has 3,600 pages indexed in Google covering a full county of cities. Each page on the site is spiderable including all of the separate listings for each city.

Information about the site: The domain is 4 years old and currently has a nice backlink profile. We rank #1-3 for about 50-60 phrases and each listing on the site we have is currently ranking in the top 5 for thousands of longtail keywords.

Here's my question:

We want to expand the site to include all real estate listings in the whole state. Doing this would add about 25,000 new spider-able listings to the site and our longtail searches should increase quite dramatically. Is this something I can do all at once, or should I take it a bit at a time? And, do you have any recommendations on the amount of pages I should add a day?




 6:49 pm on Feb 23, 2011 (gmt 0)

I generally like to phase in large changes.

What is your toolbar pagerank? Would you mind sharing a ballpark estimate on how many unique visits you have a day?

I find the more pagerank and traffic a site has, the more it can successfully expand its site size.


 6:57 pm on Feb 23, 2011 (gmt 0)

I've added 5 to 10 thousand pages at once on to sites with about 10 pages. Didn't notice any problem. In your case I might suggest doing it in two stages set apart by a couple of months.

I also later removed that content with no ill affects either.

Google has publicly stated that you should be fine within limits. I don't recall their exact terminology, but my spidey senses are tingling suggesting that you're probably on the edge - thus my recommendation to break up the release.

Now,that being said, I'm working on releasing 50,000 pages of content on a brand new site. I expect to release them all at once. But that site is intended as a link magnet only, I'm not specifically driving for SE traffic.


 7:10 pm on Feb 23, 2011 (gmt 0)

Goodroi - Thanks for the reply. Our TPR is currently 2 but I don't really worry about that. All my inner pages that rank well are also TPR 2 and have been for a little while. I do feel the site is stronger than advertised as we rank very well.

Our monthly uniques are around 7-10k depending on the season. On a normal day we get from 200-250 uniques. But, you have to keep in mind, our market is very small and I think we are about maxed on traffic. (Which is why we are expanding). Google crawls any new pages within minutes that we add them and the home page is cached about every 3 days.

Wheel - Thanks for the information, good to hear about real world results.

Another question. With the new pages, we will be submitting an xml sitemap that is separate from our regular content pages. This is a listing sitemap only. Is it okay to have two separate sitemaps? Or should we just combine both of them?


 8:29 pm on Feb 23, 2011 (gmt 0)

I think Matt Cutts said in some video that you should slowly roll out such large chunks of new content.

I would agree!


 8:44 pm on Feb 23, 2011 (gmt 0)

In this case, I'm not so sure that it would be a crawling problem at launch. But since the pages are real estate listings with a significant churn, you will want to be sure that you handle their removal very cleanly - ideally with a 410 Gone header. Otherwise crawling problems may evolve over time. I'm working with a parallel case right now, and they have a nice PR5 home page.

Just because I said it won't be a problem, that doesn't mean that I see any significant advantage, either. Those kind of listing pages are more for the visitor than the search engines. It's hard to get the URLs to stay in the index or to get even long tail traffic, because so many other sites will often have something nearly duplicate. The vertical is jam-packed and Google appears to have an almost standardized ways of ranking individual URLs - or should I say NOT ranking them ;)


 8:58 pm on Feb 23, 2011 (gmt 0)

@Max - I remember seeing a video by MC about the pages, but I know a lot of real estate sites that add and remove listings all the time that seem to fair well. I think we will roll them out over the next couple of months as everyone here is suggesting.

@Tedster - Our site is setup so that listings have unique titles and meta descriptions. Each listing also comes with local community and neighborhood information that G crawls in order to keep them cached.

For every listing we rank if you were to search the MLS# or the address in the top 5 every time, many times we are the first result over sites like T, Z, and the big R. (I'm sure you know what those sites are if you have been in the real estate SEO game.)

We get a lot of long-tail traffic that converts into showing appointments from people searching MLS#'s and addresses. This is the reason we want to add the extra listings. Even though we don't really service those markets, we can refer out the leads to other agents for a nice chunk of change.

Anyway, what about the dual sitemaps? Is that a good idea to keep them separate or just integrate them both? Thanks!


 9:14 pm on Feb 23, 2011 (gmt 0)

I think the current sitemap limit is 50,000 URLs, so you're only a little over half way there. You might go with multiple sitemaps and a sitemap index file if that's easier for you to manage - but there's no inherent advantage in crawling or indexing.

We've got a forum devoted to Sitemaps [webmasterworld.com], robots.txt and met tags. If you get into some issue or other with your sitemap, try posting in there.


 5:53 pm on Mar 21, 2011 (gmt 0)

Quick Update - We implemented the new listings sitemap on February 25th. We went with the full 25,000 pages all at once. I know it wasn't what we planned but we just went for it. Turns out, it was a good idea, here are some stats as of today just under a month later.

Pages Indexed: 10,900
SE Traffic Increase: 8.0%+

We went with two separate sitemaps, a listings sitemap and a content sitemap. All things seem to be rolling along fine. One thing I want to add. It took about two weeks before Google started showing the the bulk of the new indexed listings pages. They would index a few at a time for the first couple weeks but it started to show A LOT more about two weeks in and it continues to increase everyday.

Also, crawl rates went through the roof in the first week and just last week it exploded again. First week of new pages we saw 800 pages crawled a day and as I look at WMT today, we hit a peak of 1,800 over the weekend. Thanks for all the advice and I'll update again in a month to see how we stand.


 6:55 pm on Mar 21, 2011 (gmt 0)

The only problem is dupe content. Real estate listings are in many sites, no?


 7:03 pm on Mar 21, 2011 (gmt 0)

Yes, quite a lot of sites. We have combated this problem by adding unique titles and descriptions to each page. Each listing also comes with unique statistics that only our website has as well as local community information pulled from a feed no one else uses in our market. These extras make our content stand out enough to get indexed and rank ahead of Trulia, Zillow, and the main Realtor websites.


 10:11 pm on Mar 21, 2011 (gmt 0)

We have combated this problem by adding unique titles and descriptions to each page

Google is looking at content as well as the titles and descriptions to weight the quality of the page. You need to keep these pages fresh and consider ways to differentiate them uniquely to improve your rankings without scripted content. Not easy when the content is aggregated - but there's always a way.


 10:19 pm on Mar 21, 2011 (gmt 0)

Each listing also comes with unique statistics that only our website has as well as local community information pulled from a feed no one else uses in our market.

We have unique content related to each listing that is automatically pulled from the feed that no one else is using. It works for now. We rank in the top 5 for almost all indexed listing pages if not the top 2 beating out the big boys, so I think we're on the right track. But, of course, who knows what will happen as the years go by.


 1:40 pm on Apr 27, 2011 (gmt 0)

25,000 new spider-able listings to the site and our longtail searches should increase quite dramatically

In the light of the Panda update you may want to reflect on this a bit harder. More indexed pages is not necessarily a good idea any more.


 4:17 pm on Apr 27, 2011 (gmt 0)

Hi Whitey! I appreciate your concern but the launch went extremely well. I think the real estate niche was all but unaffected by this update. I haven't seen any changes in the markets I monitor during both updates.

Our website as of this morning is up in traffic by 37.88% compared to last month. We now have 20,200 pages indexed and our rankings really haven't budged. To be honest, I haven't seen any movement in the real estate niche in my area. I've also talked to a lot of real estate webmasters and they also haven't seen any movement in these two updates.

Maybe you can gleam some information about the update by studying why real estate websites are largely unaffected? I know that every site I monitor has basically no ads, that could be something?


 5:05 pm on Apr 27, 2011 (gmt 0)

every site I monitor has basically no ads, that could be something?

I think it could be. Real estate sites usually take pains to make sure their content engages the visitor at the top of the page.

[edited by: tedster at 5:23 pm (utc) on Apr 27, 2011]


 5:19 pm on Apr 27, 2011 (gmt 0)

@tedster - You're absolutely correct. It's very important to have content at the top. Here is how our landing pages for city searches are setup.

Top = Flash Intro

Just Below Flash = Main Navigation

Left Neighborhood Nav - Middle Content - Right Quick Home Search

The flash intro takes up about a quarter of the page and then it's straight to content related to the city followed directly by listings, and further below the listings (we do 10 on each page) is more content. Each page has between 600-1000 words of unique content.

Zero ads anywhere on our site.


 5:34 pm on Apr 27, 2011 (gmt 0)

The flash intro takes up about a quarter of the page

At what resolution? For what percentage of visitors?


 5:40 pm on Apr 27, 2011 (gmt 0)

Sorry, I'm net very web savvy in that regard. How do I check the resolution and the percentage? ;-)

(I didn't build the site it's a custom built site from a large real estate web development company.)


 11:08 pm on Apr 27, 2011 (gmt 0)

That's good news hispdcha & thanks for sharing. 2 things that stand out to me :

- how your vertical structures it's pages for engagement
- the vertical itself

Is Google scoring verticals differently ? e.g. shopping , real-estate , travel


 11:54 pm on Apr 27, 2011 (gmt 0)

Is Google scoring verticals differently ? e.g. shopping , real-estate , travel.

I definitely think Google ranks web pages in the real estate vertical oddly. For example, if you look at some of the highest ranking real estate sites they are all involved in some sort of reciprocal link scheme.

They also use keyword rich anchor text religiously. I know SEOMoz brought this up with the highest ranking real estate site in Nashville, TN. I don't know if I can link to the SEOMoz study, but you can find it by Googleing competitive backlink analysis seomoz.

I talk to quite a lot of real estate webmasters (100's+) on forums. Not one of them has said anything about panda affecting their rankings. I also run a blog that has general real estate articles on it, and recently it took over some top spots from about.com. :)


 12:24 am on Apr 28, 2011 (gmt 0)

My sense has long been that Google tailors their algorithm to different types of sites, different taxonomies if you will. That didn't just begin with Panda.


 12:26 am on Apr 28, 2011 (gmt 0)

So i take it these might be characteristics of the RE industry that are different :

Content is aggregated , but ...

- it's fresh / recycled quickly ( other aggregated content sits around a long time - except for small pricing elements )
- the content is relatively restricted in it's distribution ( versus travel / shopping sites where product is reproduced 1000's of times )

So i wonder if it is the vertical , or rather the way the vertical structures and recycles new data

Were the 25,000 new pages , property listings that will disappear over time and be replaced ?


 12:45 am on Apr 28, 2011 (gmt 0)

@Whitey - They go off our website when they go under contract. Once they go off they will never return so we serve a 404 page. They aren't necessarily replaced but anytime a new listing hits the market, they are added to our website. A lot of churn with our pages, and currently we have 5,000+ errors in WMT due to these pages.

The feed is called an IDX and it's provided by an MLS. MLS's are controlled by Realtor's and the only way to access this data is to be a Realtor, or like the big real estate websites, they have to pay an enormous amount of cash for the feed.

Just for our feed in my state that covers about 20 counties out of 28 counties, it costs us $6,000 a year. Plus we have to pay an additional $200 a year to pay auditors to check our site once in a while for compliance.

The long standing content we have on our pages are city/county/neighborhood pages. Each of these pages are designed for one or two keywords, but the listings that sit on those pages recycle out a lot.


 1:58 am on Apr 28, 2011 (gmt 0)

My sense has long been that Google tailors their algorithm to different types of sites, different taxonomies if you will. That didn't just begin with Panda.

Normally the last thing I'd be doing is adding more pages to a Panda prone site, but this thread fascinates me as it posed as a success in the face of the update.

Tedster- could taxonomy & usabilty scrutiny have been an element that has been ramped up in this update?

By default , I'm thinking if done well , it's a form of engagement that presumably Google might like.

hispdcha- A few questions :

On those pages would you say users interact heavily with the navigation elements ( county / neigbourhood / listings ) ? , and

What is the bounce rate like on your listing pages ?

What is and/or how is the code base for the search elements generally set up in your vertical.


 2:14 am on Apr 28, 2011 (gmt 0)

Whitey, that's a pretty theoretical/abstract question, but I'll make a couple comments.

I don't think Google has a direct measure of usability in their algorithm, only what they can infer from user data as to how pleasing or not the page is for the visitor. I'm sure that taxonomies of all kinds (query intention, document classifier, user type, etc) play into ranking, but it's more like the framework that Panda operates within - again not something directly part of Panda.


 2:18 am on Apr 28, 2011 (gmt 0)

On those pages would you say users interact heavily with the navigation elements ( county / neigbourhood / listings ) ?

No not really. They are more for internal links and landing pages I would say. The only thing visitors really interact with is the top navigation and the right side quick search where they can search listings. Left side navigation gets almost no play. Our visitors really only care about searching listings, the rest is fluff.

Bounce Rate for listings pages are quite high actually I would estimate 75% to 80%. Many will come directly to listing pages from an MLS# search or an address search. They get what they want and leave. The people that stick around are the ones that come to the site through a city/neighborhood/county search.

I am not a programmer so I wouldn't really know. I know our site is php because after each URL it says .php. - lol :).

[edited by: tedster at 2:43 am (utc) on Apr 28, 2011]


 4:05 am on Apr 28, 2011 (gmt 0)

I am not a programmer so I wouldn't really know. I know our site is php because after each URL it says .php.

I was wondering if some of the code was hidden within the CSS, and if the standards of coding on sites you observe were generally good.

It's interesting to hear about your bounce rates - a lot of folks appear to be saying it's not an issue for consideration in the recent update.

Given that you've proved sceptics like myself wrong about loading so many pages, do you think based on your experience and observations you have a simple message for sites hit by Panda that might be considering losing pages to get ranked again. Or indeed sites nervous about adding pages.

What basis did you work from that stood the test so many others are not so sure about right now?


 5:12 am on Apr 28, 2011 (gmt 0)

Yes the searching capabilities and the IDX setup are definitely not open source. There are hundreds of IDX providers out there. There are two companies that are above the rest, and their technology is the best I've seen. I use one of them for mine. I've seen many programmers ask how the webmasters that build my site setup their coding, but they aren't talking.

I will say I've been doing SEO since 2007, I am very new to the field. I can't really say anything about this update as I only really monitor one niche. What I do know is that for some reason the real estate indsutry wasn't hit, which I think you may be able to study why and gleam some answers.

One thing that may contribute to our success is our return visitors. We have roughly 25-30% of our traffic returning multiple times. We force people to register after viewing one listing if they try to look at another one. We then set them up on a drip email campaign where we send them the new listings that hit our site for that day and they can click directly to our site through email. This may be a signal that our site is quality due to the amount of visits from gmail.

I can't comment on other people's sites. All that I can say is our site increased traffic after adding over 25,000 pages just after the Panda update. Why? I think it's due to our content. You can't get real estate listings anywhere, they have to come from a site owned by a trusted professional such as a Realtor. All of our listing pages are unique in some sense with the extra feeds we add for statistical information, neighborhood data, and local business data.

All of our neighborhood pages are constantly being updated with new listings as they come on the market and old listings are going under contract. None of our pages have been scraped. We have a descent link profile for a real estate site comparatively. And we have zero ads. Our site is all about the user because we want them to stick around so we can nurture them into clients later on. People searching online for homes are 6 months to a year out from buying, so it's important to make sure they want to keep coming back.

All of the content on our neighborhood/city/county pages is written by me, and I used to be a Realtor. I know the markets I write in very well, and the content can't be found anywhere else. You would have to live here to write the content I write. Hope this helps.


 6:32 am on Apr 28, 2011 (gmt 0)

hispdcha - It looks like you have passed the SEO litmus test very well over 2 years , and could teach much - well done.

- Repeat visitors
- Solid , unique content
- Useful information
- Fresh listings

Great signals and genuinely compelling - next time I'll ask you for advice :)

Do you bother with other signals like FB likes , Twitter , UGC etc. on those pages ?

This 40 message thread spans 2 pages: 40 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved