Next june it'll be 3 years ago since I started in the SEO world, in a professional depth. I've been always a programmer, even I've been working in seo, and I always will be. For this reason, I cannot stay with static html pages if I'm given the chance of adding scripting and programming around there, which in SEO aspects can be a problem :P :S
In addition, when I started in this world, I found it was hard to find a start point for getting knowledge... I learnt a bit about Google and PageRank, and only then was able to understand the lots of comments that are floating here, and go on from there.
So, for this reasons, I'm providing here some guidelines to allow beginners to start on this world.
First of all, the very first source any webmaster should visit is the Google Guidelines for wembasters ([google.com
]). The part I feel most important is the last lines, which say
Webmasters who spend their energies upholding the spirit of the basic principles listed above will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.
Anyhow, even this guidelines are very useful, they don't tell too much about how to improve crawling of dynamic pages, they only comment out the fact that these pages are not so easily crawled as static pages. Due to this, I'll give here some points to help about that.
- A crawler can be easily lost in a dynamic site if it's not well structured and designed. Even so, a good desing work can help your rankings taking profit of the fact the site is dynamic.
- Most search engines care a lot the presence of keywords in a url, and dynamic urls are a great chance to include keywords in a url, if well used.
- All search engines do a lot of work against SEO. This is, they want their algorythms to put in the first place the most relevant site, and not the best optimized. The best long term SEO is to make your site truly relevant, and help engines to notice that.
- In their struggle against spammy SEO, the engines hate everything that can be taken as an attempt of artificially altering a site's ranking in SERPs(Search Engine Results Pages). Automatized and/or reiterative techniques are the best example of this.
- Engines avoid visiting dynamic pages where there is an 'ID' parameter. This is done because they don't want to publish content which is suposed to require identification to be accessed. So, avoid using the 'id' (either as a parameter itself or as part of another paramater, ie: 'sectionid') at all, unless that parameter is truly a user identifier.
- A very delicated point with dynamic urls is about sorting and apparently duplicated content. This is, if you have the urls http://www.example.com/index.php?section=main&lang=en_us and http://www.example.com/index.php?lang=en_us§ion=main then the search engine will consider both urls different (in fact, they are different), without noticing that they refer to the same resource. Then, when crawling them separately, they may be taken as duplicated (they fall, in fact, in the definition of duplicated content, as different urls providing the same content). In order to avoid that, the best bet is to take a list of all parameters taken by all the dynamic pages on the site, sort them, and then apply always this order when putting the parameters in the urls (links, actions for forms, and so on)
- As I said in (5), parameters like sectionid are discouraged. In addition, it's better to use names for sections, languages, etc instead of id's. ie: a parameter section=3 gives no info to the searcher, but something like section=downloads would make that downloads section to rank better in any query containing the word 'downloads', because the engine will know from the url that this resource has something to do with downloads. Anyhow, following to the comments in (4), an url like [download.example.com...] would hurt you more that help you: this would be completely artificial! As a summary of this point, I can give out a method to let you know if a url is good or not: take the following examples:
Now, which one do you feel that tells better (to a human reader) what the page is about? Of course, it'd be the first one, because it tells that you are starting the product_name's download from the example site. In fact, this will be the one that most will benefit from search engines, in the aspect of ammount of visitors sent and the interest of these visitors about your product. It's true that the second one could be better ranked when searching for 'download', but most of the users going there from the search will find that this is not what they where searching for, they'll quit, and they'll get angry whenever they see that page noising among the 'relevant' results. The last option, from the straigt programming point of view, may seem better, because it will use fewer server resources (numeric operations are far simpler than text-based ones), but it won't tell neither the visitor nor the crawler what's this page about.
- Summarizing all the previous points, a good dynamic url should be 100% free of id parameters, have sorted parameters with descriptive (but as short as possible) names and values, and should not contain the same keyword more than once, unless it truly makes sense. ie: tales.php?tale=title&author=nickname_author would be ok, but not if you put tales.php?taletitle=title&taleauthor=author, because it would be spammy: tales.php is already telling that this page has something to do with tales; making clare which is the tale to show is ok, but since the code in tales.php obviously 'knows' that it's about tales, parameters should not mention they are about tales again.
Well, I hope this helps newbies to get the basics about dynamic urls. I'm gonna comment now, also for newbies in this world, some other points about the 'urban legends', putting in a single place which of these are true, which are false, and which are partially true. So, here I go:
- Frames: Do Google and the other search engines ban pages for using frames? The answer is ˇnot at all! There is no banning nor any intention to harm 'framers', but the crawlers are not ableto access the framed files properly. As a solution to this, if you use frames, you should include alternative content in the noframes tag, and it would be very good to link from there the framed files, so the engines will find and crawl them.
- Meta description: Do the engines ban sites that include a keyword list as description? There is no short answer for this one. There may be some engines that do that, and IMHO it would be a good practice, because meta tags also allow for the 'keywords' tag for that effect. Anyhow, if the list is not of keywords, but of variations of the same keywords, the risk of being banned becomes higher. A example of a bad description tag would be something like 'foo, foo articles, foo projects, foo downloads', and a example of a good one could be 'foo.com provides you all about foo: articles, projects, downloads and so on'. Note that this one includes all the keywords of the other example, but it's 'natural' text and won't be taken as artificial by engines.
- Sitemaps: Does my web need a sitemap to be well ranked? Short answer: no. Long answer: it's not a need, but it's a good practice. A sitemap is a file that let's your visitor view the structure of the site and access any page on it. It also gives a great feed to the engines, that by crawling it get a very complete list of files in your site, so they can be more efficiently indexed. In addition, Google Sitemaps Beta allows you to provide google with a sitemap file (txt or xml) that lists all urls to be crawled. Since it's a beta, how works this service is not a known data for us. Anyhow, with a bit of work on XSLT applied on a XML Google Sitemap, you can get a sitemap for your web based on the same file you provide to google, but this goes beyond topic.
- Browser optimization: Which browser should I optimize my page for? ˇNone! If you do, you're actually un-optimizing your page. The best optimization you can do for a page is: ensure that it's valid HTML, and CSS if you use it (look at www.w3.org for official validators), that there are no broken links, and that visitors using dial-up won't become bored of waiting it to download. If any of your files has to be big by need, ie: an image gallery, include a note explaining that it'll take a bit more than usual to download and the reasons for that. If a browser doesn't properly display a valid HTML file, it's the browser's fault, and not yours. Also, if a browser does properly display a non valid HTML file, it's a conceptual aberration: if the HTML is non-valid then the proper display is not defined at all, and any attempt to display it is pure speculation. Well, most of the speculation done there is acceptable, such as presenting a code like
as if it was
some <b>text <i>and</b> so</i> on
, that's ok; but if your file is XHTML and has the first version of the code, you'll get an error message when trying to view the it.
some <b>text <i>and</i></b> <i>so</i> on
- HTML version: Which version of HTML should I use in my documents? That's a tricky one, due to great recent work done by the W3 consortium. IMHO, the best choice would be XHTML 2.0 [w3.org], but it's still a working draft, so I suggest you to go with the previous version, XHTML 1, which is a re-definition of HTML 4 as XML. In fact, if you can get your document validate as XHTML 1 strict, then crawlers will be able to parse well enough the structure of your document. Anyhow, keep plugged to updates of XHTML2, because it will allow crawlers to understand documents written in that language.
Well, I hope that with this newbies comming around here will find a good reference to start SEOing their sites. If somebody feels that I've forgot something, or that there is something wrong here, I'm open to constructive critics. Also, if anyone has a question about that, I'll be glad to answer it (if I'm able)
Herenvardö, Happy Hippie Heviatta, a.k.a. H4