Forum Moderators: open
1. All pages have a unique and descriptive title
2. The pages don't have an excessively long URL. Once you above about 50 characters, the chance of the page being indexed declines rapidly. Over 90 - almost never.
3. The page is not deep in a directory such as: domain.com/i-am-great/why/reasons/press-releases/he-is-great.php. Better: domain.com/he-is-great.php
4. There are several links to the page. In other words, you should have good site navigation.
Any other tips you would add?
Then try making a site map with only 1 variable in the URL. In the May '02 update our 3 variable pages were dropped from the index (which was most of our site). So I built a site map with 1 variable URL's. It was a little difficult and some info on each page was missing, but it did get the pages back into the index.
Also make sure you have NO REDIRECTS in your site navigation. Many off the shelf programs will redirect to different pages if variables are missing from the URL. For example if one category of your site has no sub categories it might be redirecting the content page after looking for sub categories. If the there are sub categories, the page is shown with the sub categories listed.
Home -> category -> sub category list-> content
Home -> category -> redirect -> content
So your sitemap would look like this:
widgets/fuzzybluewidgets/links to each individual widget
/fuzzygreenwidgets/links to each individual widget
/nonfuzzybluewidgets/links to each individual widget
/nonfuzzygreenwidgets/links to each individual widget
All you would have to set up would be the widgets page and the subcat one down from that.
I read that index.php?threadid=nnn should work for google, but no love...
I just spent the morning creating a hack that would allow me to output all the content in a way that each post would show up as:
domain.com/1/index.php
where 1=postid...but after all that, I got stopped at post 35,000 or so because it looks like Linux won't allow more than 35,000 subdirectories..
The rewrite trick isn't working for me which I suppose is just as well as I read somewhere that mod-rewrite sends a certain header out and that google can detect it...
This is an extremely annoying problem.
[httpd.apache.org...]
"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail."
I found that if the rewrite rules didn't cause the HTTPD daemon not to load anymore, then it didn't do what I expected it to... I had similar experiences with Sendmail :)
domain.com/1/index.phpwhere 1=postid...but after all that, I got stopped at post 35,000 or so because it looks like Linux won't allow more than 35,000 subdirectories..
BTW, I just read an article interviewing Brett. Geeez! You wrote this forum code yourself? Awesome job Brett!
You should resell it! I'm very happy with vbulletin and have no plans to switch, but this is certainly marketable code...
The key to this is having structured urls.. for example..
domain.com/12343223/keyword4/keyword3/keyword2/keyword.html
of course google see's this as a structured site with keyword rich url.
The script could do a $key = explode("/",$QUERY_STRING); {following me here php guys?} this would then mean that $key[1] == 12343223 which can be used to output your content accordingly..
This isnt the best explanation, but by utilising apache error404's (and forcing the header as "200 OK"[smilestopper]) Tricks google into getting the big urls (without the query strings).
You can use this as a search function (PHP.net utilises this for something such as http://www.php.net/explode, although they then header redirect)
This isnt the best explanation in the world but ive just come back from the pub :-p
BTW, is there a way to tell from the browser or some other method what the header is?
Grumpus also mention to me in sticky that I have some no cache headers, which is just vbulletin stuff, could that also have an effect on google?
Server Header Checker [searchengineworld.com]
Rule 1:
Keep you url variables in a structure so that any particular url will always (or as often as possible) return the same content. When google returns to the page it expects to see similar or at least on topice content - not an error or 404. So "once only" url strings are bad.
Rule 2:
Each page should have a unique url, so session identifiers are very bad. If google visits the links:
something.php?sessionId=123,
something.php?sessionId=465 and
something.php?sessionId=789
and the same content is presented for each request I *think* it will get fed up with your site pretty quickly.
Fuzzy stuff:
Short and descriptive is probably good for users, but I don't think google cares too much (I think going over board with length will probably have a negative impact but I don't have proof). I have a site that uses url variables in the form something.php?x=123,456,789,123 (kind of like Vignette does/did). This is meaningless to users but google doesn't seem to mind it at all, this site does however conform to rules 1 & 2.
There are ways of "hiding" url variables, one way I have used is something.php/page/contact_us. This works by tricking to user agent into thinking that contact_us is a folder, while the web server knows that something.php is the file that is being requested. You then grab the query string environment variable and strip the variable name/variable value pairs out off the end of it (in this case page=contact_us). How you do this depends on what you are using in terms of a web server & scripting language (do a search on "search safe urls php" or whatever your language is). Anyway I think this is rapidly becoming redundant and probably not worth worrying about. Personally I don't worry about hiding the "?" but concentrate on rules 1 & 2 and things seem to be fine.
Another quick thing about that "hiding" method - the user agent thinks that its looking at the contact_us folder so you must use absolute paths for all links and images references which can be a pain, especially if you are trying to implement this on an existing site.
Anyway hope thats clear. While its not gospel, sites I've been looking at for the last 6 months it seems to work well. Maybe Brett_T or GG can confirm?
This is done by the following (hopefully fool-proof) code:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^news/(.*).php$?id=$1 [L]
In the same way, I redirect images to GD to be thumbnailed:
RewriteRule ^images/news/small_(.*)$ [gdfile].php?src_img=images/news/$1 [L]
I would not advise using the '404' method as this could potentially return '200 OK' when a 404 has been encountered. A good example of 404 usage is on the PHP.net site where it will redirect to the closest page or if it can't find one, send the url to the search page.
HTH