Welcome to WebmasterWorld Guest from 54.225.2.178

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Best way to avoid sandbox with huge site launch?

2 Million pages before even any real content is added

     
7:40 pm on Jul 18, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


I am looking into launching a review site of sorts and I assume I will be sandboxed from the start like most all but I want to keep the damage as low as possible. From what I have seen, I do subscribe to the fact that the growth rate of a site does have a factor in sandboxing along with many other factors.

I have almost nailed down the site structure and it will be something to this effect.
domain.com/brand/state/city/ with of course each dir having it's own index page targeted towards the brand and it's local outlets.

Based on the numbers I have run with all cities in the US, just the directory structure alone will create close to 2 Million pages of navigation content. Even if I cut the cities down by 3/4th to only cover semi decent sized cities, it will still have 1/2 million pages or if I did REALLY deep cuts maybe 100K pages. This is before even adding the reviews into the mix.

I could launch tommorrow with that many pages but I just don't know. I have thought about setting up the system so it only show links to brands, citys, and states when an actual listing has been added to the database. I bought a list from a data source but they need to be scrubed before each goes live. Scrubing the listings to the database manually is going to take a long time anyway. I figure 100 new listings a day depending on how many hours I put in.

Going in this direction I could work on scrubing a single city at a time so not as to add extra directory pages until that certain city is done. 100 pages a day I would assume would look allot more natural then 100k-2Mil from the start.

So what are others opinions on this. Is all this worry for nothing?

8:39 pm on July 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member beedeedubbleu is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 3, 2004
posts:6109
votes: 6


You will have to design and write for people instead of the engines.

Jermey, check what you wrote a couple of weeks ago, [webmasterworld.com...]

I take it you have changed your mind?

10:39 pm on July 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 9, 2003
posts:735
votes: 0


Your worry is very real. I think attempting to launch 2 million pages at one time is suicide.
2:14 am on July 20, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


BDD,

The question I was answering was

how to optimize if the results will be always different from user to user?

The big word being "if". I do not believe the serps are to the point to where the only person you should be writing for is the user. They are still predictable based on certain principles. Give it a few years and that may be true, but for now, both what the user and search engine are expecting have to be taken into consideration.

6:09 pm on July 20, 2005 (gmt 0)

Preferred Member from TH 

10+ Year Member

joined:Mar 4, 2003
posts:421
votes: 0


Google might still take time for indexing 2 million pages. Your content could not show in the listing within a couple weeks.
6:54 pm on July 20, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 5, 2001
posts:5753
votes: 81


JeremyL; I have no experience with a site this size, but I still have a couple questions/comments.

I could launch tommorrow with that many pages

How many pages, 100,000 or 2,000,000?

I have thought about setting up the system so it only show links to brands, citys, and states when an actual listing has been added

This sounds like a good idea, especially if you start with the 100,000 and added a bunch of pages each day.

Can you make a obvious notice that new content is added daily to encourage folks to return and see what's new?

I'm assuming that on a site this size you'll have an internal search. If so, can you coordinate the new content with the unsuccesfull search results? If so, can you automate a message on the results oage saying thanking the person for the inquiry and telling them the content they are looking for is coming soon, or something similar?

7:42 pm on July 20, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:July 5, 2004
posts:470
votes: 0


I launched a large site about two months ago with 10,000 pages (which until I read your post I thought was a good number of pages). At any rate, all pages were indexed almost immediately by Google, but are still sandboxed. MSN indexed a few hundred pages within about three weeks and they went right to the top of the SERPS. Yahoo has managed the laborious task of idexing the homepage. SERPS are still pretty anemic about two months out. PR debuted at 6 a few days ago. Links are growing at a healthy clip.

I think it was a mistake for me to wait to launch until the site was "ready". I feel I should have launched with a bare minimum and built slowly. I'd be in much better shape at the moment I'm sure. With 2 million pages, you really should have launhced a long time ago I think.

Google likes to see action, while Yahoo moves at the speed of molasis. I won't be waiting to launch again until everything is perfect, after all these are websites, not the space shuttle.

8:21 pm on July 20, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


If so, can you automate a message on the results oage saying thanking the person for the inquiry and telling them the content they are looking for is coming soon, or something similar?

Very good idea. It actually just gave me another idea. The main way people will find locations is via a zip code (maybe city) search. I can load 100% of the data into the search function and write a script to release 100 listings a day to the directory pages. The directory structure is actually htaccess created, so the SE's can see the domain.com/dir/dir, but the people doing the search will do so on domain.com/listing.php?var=2452 which I can deny in the robots file.

3:15 pm on July 21, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Mar 28, 2004
posts:224
votes: 0


What about launching the site as you have it, but block Google from most directories and pages with robots.txt. Then, maybe once a week, unblock a directory in robots.txt. Would that work? I don't know from experience, but it sounds like a decent idea. Then, you could let other engines like Yahoo get the full site and not scare off Google too badly.

I'd love to know if this works because I might be in a similar position soon.

5:29 pm on July 21, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


I've always been fearfull of doing that for pages I know I will want to rank good later. That and that would be one huge robots file.
5:35 pm on July 21, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 16, 2003
posts:107
votes: 0


The directory structure is actually htaccess created, so the SE's can see the domain.com/dir/dir, but the people doing the search will do so on domain.com/listing.php?var=2452 which I can deny in the robots file.

This is a mistake. People will make links to the URLs they see and as you disallow spiders access to those pages, you will not get ranking benefits from those links. Pages should only be visible under one address.

6:18 pm on July 21, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


Yea, I didn't really describe exactly how it will be. The searh results will be domain.com/search.php?search=blah, but the result list will actually link to the domain.com/location/listing.htm. There is no way to get around having the search results be dynamic looking and I don't want them to be. But the actual listing and listing reviews will be the same as if they found it through the directory.
9:07 pm on July 21, 2005 (gmt 0)

Senior Member

joined:Oct 27, 2001
posts:10210
votes: 0


Your worry is very real. I think attempting to launch 2 million pages at one time is suicide.

Especially if they're just empty vessels waiting to be filled by users.

9:53 pm on July 21, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Oct 6, 2004
posts:216
votes: 0


I would just put a meta noindex tag on the pages that don't have content yet and set it up so the noindex tag goes away when the page is updated with content.

Google will index the pages as soon as the noindex tag comes off. I did this to a link directory with many categories but few links and it worked great. As soon as links were added to a category, the noindex tag was changed to index and google indexed the page 2 weeks later.

6:35 am on July 22, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member beedeedubbleu is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 3, 2004
posts:6109
votes: 6


Especially if they're just empty vessels waiting to be filled by users.

Yeah! This sounds like another of those really interesting sites :(

6:48 am on July 22, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 30, 2003
posts:625
votes: 0


I've had several sites that have been sandboxed and some that have not.

The biggest factor in being sandboxed is the theme/subject (or peceived theme by G) if it triggers the sandbox then you're in, links, size of site etc will not come into it.

2:45 pm on July 22, 2005 (gmt 0)

Junior Member from IT 

10+ Year Member

joined:Oct 22, 2002
posts:127
votes: 0


If the site is any good people will link to it. 2 million pages is allot of site and it will get important and Google will note that.

Why don't you start with 10 of the biggest cities and see how those pages work out, then roll it out accordingly. That way you can see how effective the site is in terms of converting customers.

How much better would the web be for users without databases.