Forum Moderators: open
Just a quick question; the following URL will be easily spiderbale by the SE's - true or false?
www.example.com/directory1/cms/page.asp?121
I believe that the SE's would have no problems spidering this url!
Many thanks guys.
[edited by: Xoc at 8:23 pm (utc) on Nov. 20, 2004]
[edit reason] changed to example.com [/edit]
We have several websites with very similar URLs .... all have been listed in all SE's we have submitted to - including Google.
Google infact states in its help pages that it can read '.asp?......', however it does advise not to go over the top i.e. filename.asp?something=1&something=1&something=1&something=1 etc etc etc etc
Hope this helps
Just a quick question; the following URL will be easily spiderbale by the SE's - true or false?www.example.com/directory1/cms/page.asp?121
I believe that the SE's would have no problems spidering this url!
They can spider it but it will more than likely not produce any PR. I personally stay away from such directories when looking for backlinks to build PR.
[edited by: Xoc at 8:24 pm (utc) on Nov. 20, 2004]
the URL not being able to build PR, that is a problem. How can the pages rank with a PR of 0? Is there anyway you can build PR to these pages - I'm guessing if we get backlinks to the pages then this can be achieved.
The problem with directories not gaining PR is because the pages are dynamic and produced from a database and are not static, and thus search engines cannot the contents. Some search engines can now catalog what is in databases but whether or not they transfer PR I'm not sure. You need to check your software to see if there is some way to produce static pages for the search engines. I manage a classified site run from a database and it has the option to produce static pages for the search engines.
...and thus search engines cannot the contents. Some search engines can now catalog what is in databases ...
That comment isn't entirely true. Dynamic pages are just as easily spiderable as static pages (provided your querystring isn't insane). All that matters is the HTML outputting by your server-side code. If you are curious as to what the spider will see when it tries to index the page, just view the source of your page in the browser. If everything looks good, you will be good.
You need to check your software to see if there is some way to produce static pages for the search engines.
Be careful in this scenario. While you may think Google cannot get through various query strings, think again. I've seen Googlebot chomp through multiple variables and index the URI (yuck!).
If you have an option to produce static pages from any software, then you need to make sure that your dynamic pages are invisible or you stand a chance of duplicate content issues. I've reviewed many sites where this type of situation exists and they are suffering from the indexing of multiple URIs all leading to the same content.
If you are producing static pages from dynamic pages, then your dynamic pages should be a directory by themselves and disallowed via the robots.txt file. I would even go one step further and place a robots meta tag on each of the main dynamic template pages...
<meta name="robots" content="none"> ISAPI filters are the only way to go in a Windows environment. We use ISAPI_Rewrite and have been for years, the product is flawless and allows infinite rewrite routines. If your host is reluctant to install the global .ini file for ISAPI_Rewrite, they should first review the product which I think will ease their reluctance.
What's nice about ISAPI_Rewrite is once the global .ini is installed at the server root, all you need to do is drop a .ini file at the root of each web and configure from there. It does not affect anything but the web it resides in. All sites that we manage on Windows now have a root .ini which contains this...
[ISAPI_Rewrite] RewriteCond Host: ^example\.com
RewriteRule (.*) http\://www\.example\.com$1 [I,RP] The above is a simple 301 for permanently redirecting non www requests to www.
I've got a few sites built using a custom 404 page, that use rewritten URL's. Easily the most useful thing I could ever imagine.
Another one I'd be real careful of. I'm sure yours are correct but I've seen many that are not. Their custom 404 pages were returning a 200 status instead of 404.
Always, always, verify that your server headers are returning the proper HTTP Status Codes.
Although there would be no need to use one if your URL's have less than 3 parameters.
If you are going to rewrite URIs which is strongly advised in today's environment, you should strip all variables from the string. Trim that puppy down as far as you can so it becomes user friendly.
Another one I'd be real careful of. I'm sure yours are correct but I've seen many that are not. Their custom 404 pages were returning a 200 status instead of 404.
That is a VERY important point. You absolutely need to run through numerous spider simulators/header checkers before deploying to production. Make sure all your pages are returning 200 where they need to, 404 where they need to, and your 301's from your previous files are working correctly (if you are changing your structure on an existing site).
Become friends with SearchEngineWorld's Header Checker [searchengineworld.com].
Good luck. I have done #1 and verified. Its quicker and easier. We are short on resources, long on projects.