Forum Moderators: phranque
I am working on a new site that was created in OsC. It uses session ID's, but it is only available via a javascript dropdown to start a session.
The page is a low PR5, and has been around for a while, and has probably at least 10000 pages (guessing here), but only about 150 are indexed.
What are the best ways for me to go about experimenting to find out why the site isn't getting more fully indexed?
I have made some changes (like adding the sitemap to every page) that I am hoping will help, but suggestions would definitely be appreciated.
have a looksee at your log files and see if a spider is trying to access some of your links but is finding only error pages.
also, thats sounds like an awful lot of pages. i hope there aren't links to all them on one page! google limits itself to about 100 links per page, anymore then that and it balks.
what you may also want to consider is duplicate content. make sure the pages that are indexed dont have a huge amount of similar content to those that are not.
It uses session ID's
I think that might be the very first issue you need to address. Google does not like session IDs. I've seen it index some sites with ids in the string but I personally feel that is the biggest road block for any spider.
You may want to consider rewriting the URIs and remove all parameters from the string...
www.example.com/category/product/12/
...or something to that effect.
There are plenty of programs out there that will spider your site. Download one and run the spider. See what it finds. In most cases that is what the SEs will find.
Once you've done the rewrite, you will need to devise a way to direct the spider to those new URIs. If the database is capable of generating 10,000 pages, then you have some work to do. I usually like to provide index pages for each main category. Those index pages contain the rewritten URIs. Once a bot gets in there and starts traversing the rewritten URIs, tis a beautiful sight! ;)
Note: It will most likely take a few crawls from Google before it gets most of what you have available.
P.S. There are a few OsC specific topics floating about discussing these issues. I'd search the board and review those before jumping into this.