Forum Moderators: open
This way we are sure that each page earns a pagerank and can pass it on.
However, it is very cumbersome and adding properties takes a long time. We are developing a database which would make the whole thing easier, but we are concerned that if we replace the static content with database pages:
1) they may not all get crawled because Google will not try every possible query. Perhaps a solution to this is to create an HTML link to each database page we want crawled?
2) the database pages may be seen as transitory and not earn real pagerank (this takes 2 months to propagate properly). Can these pages earn real pagerank? Can they pass it on?
Thanks for your thoughts.
Next, make sure all you old urls still work, if necessary by setting up redirects for each and every of your old files. The best solution would probably be if you could maintain your old namespace while moving from static to dymanic content.
René.
That's not the case anymore - just keep the querystrings short and make it so that there's no "superfluous" data in the strings.
To make a map, just make a master/detail list and the bot will crawl the master list - you should have one of those anyway.
If you aren't much of a data developer, this project sounds easy - 1 table with a dozen or so fields in it for all of the site information. You could probably get someone to make it for you for under $300 (TRANSLATED: 30 hours a redundant data entry.)
G.
/Page.asp?Page=1
These typically show one back link from a PR5 and carry PR4 themselves. But some of these are from PR4s and show no backlinks but are still in the database. I think you get into trouble where your pages look like this:
/Page.asp?State=NY&County=Westchester&Property=25
Gbot seems to have more trouble with multiple variables or avoids them altogether. IMHO, that is precisely where you want to make the URLs look like this:
/Page/NY/Westchester/25
or whatever.
Another site I'm involved with is in a database form and has absolutely no trouble getting pages spidered. That site uses 3 or 4 variables per page but we re-write so that it the variables look like subdirectories. So I'd have to conclude that databases where the URLs look "normal" are the way to go.
Sticky me if you want a list of DB to static HTML/Web programs.
I think the lingering problems many folks are having getting dynamic pages indexed is because the (resulting) page source is just too damned complicated. On all the sites where I've heard complaints it is inevitably a page with so much muck that the bot gets confused.
Things like:
Too many languages/scripts: Many sites dump a date (or another variable) from the database via ASP (or PHP, or whatever) then parse it to a readable format client-side via Javascript or some other snippet they picked up on a web site. Why? You're using ASP (or PHP, or whatever) - keep it all the same. You're confusing the robot by saying "Do you gusta play al futball?" rather than just saying "Do you like to play soccer?"
Not making a dynamic title to go along with the dynamic content. Googlebot hits a few pages and sees lots of words on the page, but the title on all of them are "My Site: The Best Site on the Web!". Therefore, they are all about the same thing despite the content, so there's no need to index more than one page.
In the end: If you're going to make a data driven site - make a data driven site and process everything on the server side and spit it out as HTML. The only javascript should be in forms for setting focus and validation (stuff the bots don't even look at anyway).
And, as mentioned - there can't be stuff in the URL that doesn't effect the content of the page. In other words, don't use session ID's, click tracking codes, or other things like that.
Everyone seems to be making "work" out of nothing. Do it clean and uniformly in the first place and you won't be worried about "tricks" to get the bot in later.
G.