homepage Welcome to WebmasterWorld Guest from 23.23.12.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Crawling and ranking issues with a dynamic site
austinwells

5+ Year Member



 
Msg#: 4022292 posted 10:25 am on Nov 10, 2009 (gmt 0)

Hi All,

Currently I am optimizing a dynamic website and facing certain crawling and ranking issues. I hope will get proper solution here.

Actually my site is a fully dynamic website using DB. There is one index.php page and on root there is one folder called /pages/. In pages folders we have maintained other dynamic files which will includes together to make one page.

These pages are include files e.g. If i need to called the footer portion than I include the footer file into the main index.php file. Pages will generate on click. When somebody clicks on the a link they will go to index.php and the query will be like www.mysite.com/index.php?view=file. This will show the content of file.php file which is there in the /pages/ folder. And again I am using .htaccess file for mod rewriting. So the same URL given above will be look like http://www.example.com/category/file/index.html.

Each single file will appear with / and index.html. Please tell me is that ok? Can I make all pages as /index.html?

One more thing as the site is dynamic. Number of pages a day give 404 OR are delted from its location? so in that case I am redirecting that particular page to its upper level category page. I am using 301 for this redirection. Is this a proper thing to use 301 redirection for all the not found page?

OR Its better to have a custom 404 page for in case of high volume of not found problems, especially for a dynamic website.

I am also using a canonical tag for each URLs as well as a base tag. As its a dynamic website I need to use them to tell GoogleBot that for this page you have to crawl and index the Canonical tag URLs only. because sometings Bot considered the Sessions ID URls and create duplications.

Actually its a very vast website. I am doing lot of efforts for this site but dont know why it takes too much time for next crawl. Also let me know if somebody have any formula for a dynamic site for indexing and crawling specially.

Thanks for any help in advance.

[edited by: Receptional_Andy at 11:41 am (utc) on Nov. 10, 2009]
[edit reason] Please use example.com - it can never be owned [/edit]

 

FranticFish

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4022292 posted 11:56 am on Nov 10, 2009 (gmt 0)

Each single file will appear with / and index.html

One or the other, not both. If you're using mod_rewrite anyway, why not re-write the url to be extensionless?

i.e. http://www.example.com/folder-name/subfolder/page-name-here

redirecting that particular page to its upper level category page

Yes I recommend doing this. However I only do this when a page has existed. If the page has never existed I serve a 404. Do some checking and see what happens when you request urls with wrong folder or file names. You can trap these with a 404 or a 301 - but make sure that they don't return a page with a 200 header response.

custom 404 page

The bigger the site, the more useful these are. Just make sure that the header response for your custom page is 404 NOT 200.

sometings Bot considered the Sessions ID URls

Then don't parse session IDs or anything else like that in the url. If you're going to use cookies or sessions then keep them out of the url, and don't have anything like this

basket.php?location=page-name

or you'll end up with hundreds or thousands of duplicate pages.

austinwells

5+ Year Member



 
Msg#: 4022292 posted 12:10 pm on Nov 12, 2009 (gmt 0)

hi FranticFish,

Thanks for the reply. Actually I already keep the URLs end with extention i.e. all my URLs are ends with /index.html. Many of my URLs are cached with /index.html extension. what should I do if I want to remove the /index.html page and wants to implement the new structure for URL which you suggest here.

If I will use this than I need to remove all from Google index first and then again have to start from the first. Again I will also loose the link popularity which I have gained during the last link building campaign. For the newly implemented URLs again I have to start from first. So is there any way to get the link benefit to the newly launched page from the older page.

And my main question is about the only index.php from which we are accessing all the pages with the help of .htaccess. Does it has any drawback or any adverse effect on my site's positioning. The site have very slow crawl rate, I think its because of the too many redirection. What to do to improve the crawl rate?

Please advice.

FranticFish

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4022292 posted 4:15 pm on Nov 12, 2009 (gmt 0)

Re:speed
If you have lots and lots of includes then this can slow a page down. Google Webmaster Tools shows you page download time in teh 'Diagnostics' section. Some CMS systems (i.e. WordPress) have plugins that render static html pages from the dynamic ones to improve load time; there may be something like that you could use for your site.

As far as the redirects slowing things down, could be. You shouldn't have chains of redirects (I believe these may also have spammy connotations as 'blackhats' use them).
I'm not an expert on crawling issues or mod_rewrite (the guys in the Apache forum here are) but I believe that commands in htaccess are carried out in order, so if two rules apply to the same 'page' then you need to make sure you order them correctly. You want as few rules as possible.

all my URLs are ends with /index.html

So every page in your site is the root page for a virtual folder. I don't think that's ideal. If you are going to make a change (and this is something to consider very carefully) I'd ideally recommend a hierarchy where you have something like

http://www.example.com/a-page
http://www.example.com/a-folder/
http://www.example.com/a-folder/another-page

If you're extensionless then the / is what signifies the difference between a page and a folder. However if you start getting into this you need to make sure that you have the correct 301s and 404 traps in place. It's easy to screw your site up properly if you get it wrong. We let our dynamic pages serve the 301s and 404s (i.e. these are done in PHP as part of the db lookup) and use htaccess as little as possible - really just for stripping off the extensions.

I need to remove all from Google index first

WOAH! That's a bit drastic, if you mean the url removal tool. I wouldn't use that at all. As you say, you'll be wasting your IBLs.

If you're already using mod_rewrite for your urls and then want to change them and redirect old to new, then you can run into real problems with loops.

One method you can use is to 'rename' the file path slightly, so that
/products/widget1.html becomes /our-products/widget-1
or
/contact.html becomes /contact-us

There may be a way of doing it without renaming the file, but like I said I'm not an expert on htaccess.

Basically I think you should get some advice on the Apache / Linux forum about what you can do to speed up your page load time, and then consider if you also want to change the urls.

mcskoufis

10+ Year Member



 
Msg#: 4022292 posted 10:54 am on Nov 14, 2009 (gmt 0)

Might be over paranoid here, but I am a bit stressed with no extensions... What if someone links to example.com/our-products/widget-1/ ?

Your cms might be ok with it and display content, however you may end up having

example.com/our-products/widget-1/
example.com/our-products/widget-1

as duplicates...

I would go for the .html extension...

FranticFish

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4022292 posted 12:23 pm on Nov 14, 2009 (gmt 0)

Your cms might be ok with it and display content, however you may end up having

example.com/our-products/widget-1/
example.com/our-products/widget-1

That's what I meant when I said you have to have 301 and 404 traps set up.

If you're going to go extensionless then you need to make sure that your CMS can't be made to display non-existent urls.

If you don't do it properly you can really fudge your site up, so don't do it unless you know you can do it properly.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4022292 posted 12:13 am on Nov 15, 2009 (gmt 0)

Might be over paranoid here, but I am a bit stressed with no extensions...

I'll second that thought...
(LOL)

All my newer sites are extensionless and I'd remove 'em from the older ones, but I remember reading about how cool urls don't change (once upon a time, in a far far away place) and I don't want to take the 'transfer time' hit in traffic, otherwise I would.

IMO it's easiest, most consistent, and probably safest to just make the 'directory' (URL with a trailing /) a 'page' in the next level up, since directory1/directory2/ is really just an alias for the page /directory1/directory2/index.html (or whatever you may have changed your indexes to).

So, where you might save and upload index.html to /directory1/directory2/, I would change the name of index.html to directory2.html, upload it to directory1, then strip the trailing / and index.html from the URL if either was ever requested, which means I don't ever have to worry about 'duplicate content' at /directory1/directory2/ and /directory1/directory2/index.html, because they're both (easily) redirected to /directory1/directory2 (no index, no trailing /) and served the information from /directory1/directory2.html, which seems to make sense...

If you have a full directory of pages within a directory, I would think you could (should?) have a page that talks about the topic of the sub-directory in the directory and if you could not, the sub-directory probably doesn't belong where it is.

Stripping all the trailing /s and index.ext requests from URLs really simplifies the matter of going extensionless and leaves you with:

example.com/
example.com/page
example.com/dir
('index' for /dir/)

example.com/dir/page
example.com/dir/dir2
('index' for /dir2/)

example.com/dir/dir2/page

You don't really need it if you do the preceding, but I always run with Options -indexes in my .htaccess as a 'backup' for the redirects.

<aside>
In thinking about it, I actually can't remember using a directory (/directory/) as a URL in years, even with extensions? hmmm... I really can't remember the last time I did... It's been at least 3 years, maybe longer.

I often only have one index.ext on a site, which is what you get at the root, and happens to make it easy to not accidentally upload an index.ext to a directory it doesn't belong in... (Been there, done that, and it might have been one of the reasons I switched.)
</aside>

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved