|You will have to design and write for people instead of the engines. |
Jermey, check what you wrote a couple of weeks ago, [webmasterworld.com...]
I take it you have changed your mind?
Your worry is very real. I think attempting to launch 2 million pages at one time is suicide.
The question I was answering was
|how to optimize if the results will be always different from user to user? |
The big word being "if". I do not believe the serps are to the point to where the only person you should be writing for is the user. They are still predictable based on certain principles. Give it a few years and that may be true, but for now, both what the user and search engine are expecting have to be taken into consideration.
Google might still take time for indexing 2 million pages. Your content could not show in the listing within a couple weeks.
JeremyL; I have no experience with a site this size, but I still have a couple questions/comments.
|I could launch tommorrow with that many pages |
How many pages, 100,000 or 2,000,000?
|I have thought about setting up the system so it only show links to brands, citys, and states when an actual listing has been added |
This sounds like a good idea, especially if you start with the 100,000 and added a bunch of pages each day.
Can you make a obvious notice that new content is added daily to encourage folks to return and see what's new?
I'm assuming that on a site this size you'll have an internal search. If so, can you coordinate the new content with the unsuccesfull search results? If so, can you automate a message on the results oage saying thanking the person for the inquiry and telling them the content they are looking for is coming soon, or something similar?
I launched a large site about two months ago with 10,000 pages (which until I read your post I thought was a good number of pages). At any rate, all pages were indexed almost immediately by Google, but are still sandboxed. MSN indexed a few hundred pages within about three weeks and they went right to the top of the SERPS. Yahoo has managed the laborious task of idexing the homepage. SERPS are still pretty anemic about two months out. PR debuted at 6 a few days ago. Links are growing at a healthy clip.
I think it was a mistake for me to wait to launch until the site was "ready". I feel I should have launched with a bare minimum and built slowly. I'd be in much better shape at the moment I'm sure. With 2 million pages, you really should have launhced a long time ago I think.
Google likes to see action, while Yahoo moves at the speed of molasis. I won't be waiting to launch again until everything is perfect, after all these are websites, not the space shuttle.
|If so, can you automate a message on the results oage saying thanking the person for the inquiry and telling them the content they are looking for is coming soon, or something similar? |
Very good idea. It actually just gave me another idea. The main way people will find locations is via a zip code (maybe city) search. I can load 100% of the data into the search function and write a script to release 100 listings a day to the directory pages. The directory structure is actually htaccess created, so the SE's can see the domain.com/dir/dir, but the people doing the search will do so on domain.com/listing.php?var=2452 which I can deny in the robots file.
What about launching the site as you have it, but block Google from most directories and pages with robots.txt. Then, maybe once a week, unblock a directory in robots.txt. Would that work? I don't know from experience, but it sounds like a decent idea. Then, you could let other engines like Yahoo get the full site and not scare off Google too badly.
I'd love to know if this works because I might be in a similar position soon.
I've always been fearfull of doing that for pages I know I will want to rank good later. That and that would be one huge robots file.
|The directory structure is actually htaccess created, so the SE's can see the domain.com/dir/dir, but the people doing the search will do so on domain.com/listing.php?var=2452 which I can deny in the robots file. |
This is a mistake. People will make links to the URLs they see and as you disallow spiders access to those pages, you will not get ranking benefits from those links. Pages should only be visible under one address.
Yea, I didn't really describe exactly how it will be. The searh results will be domain.com/search.php?search=blah, but the result list will actually link to the domain.com/location/listing.htm. There is no way to get around having the search results be dynamic looking and I don't want them to be. But the actual listing and listing reviews will be the same as if they found it through the directory.
|Your worry is very real. I think attempting to launch 2 million pages at one time is suicide. |
Especially if they're just empty vessels waiting to be filled by users.
I would just put a meta noindex tag on the pages that don't have content yet and set it up so the noindex tag goes away when the page is updated with content.
Google will index the pages as soon as the noindex tag comes off. I did this to a link directory with many categories but few links and it worked great. As soon as links were added to a category, the noindex tag was changed to index and google indexed the page 2 weeks later.
|Especially if they're just empty vessels waiting to be filled by users. |
Yeah! This sounds like another of those really interesting sites :(
I've had several sites that have been sandboxed and some that have not.
The biggest factor in being sandboxed is the theme/subject (or peceived theme by G) if it triggers the sandbox then you're in, links, size of site etc will not come into it.
If the site is any good people will link to it. 2 million pages is allot of site and it will get important and Google will note that.
Why don't you start with 10 of the biggest cities and see how those pages work out, then roll it out accordingly. That way you can see how effective the site is in terms of converting customers.
How much better would the web be for users without databases.