Forum Moderators: open
Ive been more of a reader here than a poster, but i must say, this forum is a great place for learning... and ive learnt a lot.. so id just like to start with a thanks to everyone here!
Ok, my company and I launched a site last year, Oct 2003. A site index, that contains a bunch of resources. (Over 5000). Using htaccess mod re-writes, we've managed to make all the URL's static looking.
However, over 3 months and no backlinks in Google, very very minimal SE traffic, (im talking 10 - 12 visitors in total from Google) and home page + one sublevel show PR0, while deeper pages show grey bar.
The site is linked to from PR 7 and PR 6 sites and has been since its launch. (Has over 600 backlinks on ALLTheWeb) and is listed in 3 different categories in DMOZ.
The server is dedicated, and we recently launched a ONE page site, that got a PR6 within a week (Launched Mid December).
The site has adsense, and the adsense spider has no problem crawling. All the ads are relevant to the page / category they are on. Ive also seen multiple crawler4, crawler5 etc having over 300 visits in a month.
Somethings obviously wrong, but i cant for the life of me figure out what. No spam, no overoptimising at all.. as a matter of fact, the entire optimization has been 'natural'.
Im guessing that its got something to do with the code, or mod-rewrite, because another site launched about a month ago that uses the same backend, seems to be the same. Its listed in the google index, but PR0. Two of our other sites launched on the same server, but using a different backend have been PR'd fine, although they really show up no where in SERPS.
Ive been racking my brains and Im going insane... Google support replies with the same canned responses...
Any ideas anyone? TIA.
RewriteEngine On
RewriteBase /#
# Rewrite rules
#
RewriteCond %{REQUEST_URI} !^/[^/]+/[^/]+/[^/]+$
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^(.+[^/])$ $1/
# subcategory
RewriteCond $3 (^$)¦(^index\.html$)
RewriteRule ^([^/]+)/([^/]+)/([^/]*)$ /subcategory.php?Cat=$1&Subcat=$2 [L]
#category
RewriteCond $2 (^$)¦(^index\.html$)
RewriteRule ^([^/]+)/([^/]*)$ /category.php?Cat=$1 [L]
# subcategory paged view (must be ABOVE the script rule)
RewriteRule ^([^/]+)/([^/]+)/page([0-9]+)([a-z])([ad])\.html$ /subcategory.php?Cat=$1&Subcat=$2&Page=$3&SortBy=$4&Order=$5 [L]
# script
RewriteRule ^([^/]+)/([^/]+)/(.+)\.html$ /listing.php?Cat=$1&Subcat=$2&Listing=$3 [L]
#RewriteCond %{REQUEST_FILENAME}!-f
#RewriteCond %{REQUEST_FILENAME}!-d
#RewriteRule ^(.+[^/])$ $1/ [N]
php_flag session.use_only_cookies on
php_flag session.use_trans_sid off
Any more ideas?
[webmasterworld.com...]
my directory, article pages however do use mod rewrite with simple static urls and rank wonderful with google often being number 1 or in the first page with dozens of very competitive words/phrases in my industry.
I agree that Google shouldnt care about mod-rewrite and should only follow as a page, but the only thing i can see in common between two sites that are PR0 and have been so for over 4 months is the fact they both use the same backend CMS, with Mod-rewrites.
We've previously launched new sites on the same server (i.e same IP) and theyve done just fine in terms of PR and SERPS.
This is driving me nuts!
I would be a bit concerned about the last bit of code where it's looking for file-not-found and directory-not-found (!-f & !-d), and then appending a trailing slash and rewriting anyway. I see it's commented-out now, but it may previously have done some damage.
If your site is designed such that it is impossible or almost impossible for a robot to get a 404-Not Found, they will be leery of indexing very deeply in your site. You'll find a lot of threads here asking, "Why is Ask/Google/Ink/whoever requesting this funny-looking URL from my server? - It does not exist and there's no such link!" The answer is that the server is being tested for its 404 response.
Any site where it's impossible to get a 404 is considered as a potential trap by spiders - They are "afraid" they won't be able to get out again, and so don't go deep. For the same reason, depth is limited on sites with complex query strings or "very deep" static URLs (which they can surmise to be query-string aliases). Because it is potentially-impossible to get to the "end" of such sites, an arbitrary link crawling count must be set. Therefore, page rank can suffer somewhat if you depend on deeper pages "feeding back" PR to other pages.
I may have misinterpreted your code, but I suggest reviewing that last part specifically.
None of your rules invoke external redirects, so your mod_rewrite code is invisible to search engines. Others in this thread may wish to check their servers for unexpected responses (such as unexpected publicly-visible 301 or 302 redirects) using the Server Headers checker [webmasterworld.com]. Best results will usually be had if your site always returns an appropriate server response code [w3.org], and you don't use any "tricks" such as using 404's to create script calls.
Jim
Thanks a lot for that! Although yes, it was commented out, the cms was built in a way to send any not found document its own error message without sending a 404 header, so im assuming this did confuse / 'scare' the bots. It's fixed now...
This also seems to be the case, as allinurl:domain.com returns only around 300 or so results, whereas the site itself should have about 5,000 results.
So i guess now all i can really do is sit tight and wait right?
Thanks again!
So, if Google was aware of the old addresses and now the new ones, you may have a whole set of duplicate pages - IF - somewhere in your navigation system you have any links at all to the old urls system (before applying mod rewrite).
It may be worth rechecking your navigation system to ensure there are no links pointing to querystring urls.
I used the server header checker tool and noticed that a 404 error was not returned when I typed in a file that did not exist. Instead the browser is redirected to a custom error page using the feature in Plesk 5 called "Custom Apache Error Docs" - this returns a 200 status code.
I have now unchecked this box in Plesk and proper 404 messages are shown - I help that helps things!
Alex