Forum Moderators: DixonJones

Message Too Old, No Replies

Apache server log shows directory and page as two separate entries

duplicate log entries for same page

         

rep82

5:43 am on Aug 13, 2008 (gmt 0)

10+ Year Member



I have a log problem I've been trying to address for some time now. My Apache server log (and logs on other services like eXtreme) are showing two separate entries for the same file. Example: xyz.com/blue-widgets and xyz.com/blue-widgets/index.html show as separate entries in logs. Can anyone tell me if this is normal or could it be a bug of some kind? Also, is there anyway to fix this so only one log entry will show up? Any help on this would be appreciated.

P.s. I use exact url's in my site navigation e.g., xyz.com/blue-widgets/index.html if that makes any difference.

jdMorgan

6:46 am on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure what you're asking, since we can't see the relevant log entries...

If you see two entries for the same client request, and the first one indicates a 301, 302, 303, or 307 redirect response, while the second one shows a 200-OK or a 304-Not Modified response, then that indicates that you've got a redirect on your server from the first-requested URL-path to the second.

If you're saying there are two different 'kinds' of log entry for the same 'home page' and you only want to see one 'style' or the other, then that is something you need to fix in your linking.

BTW, it is considered 'best practice' to omit "/index.html" from all links, and simply link to "/". Defining index.html as your DirectoryIndex obviates the need to link to /index.html.

If none of the above is helpful, please post a small number of relevant samples from your log file, so we can discuss things more specifically.

Jim

g1smd

11:39 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note that "/" and "/index.html" are two different URLs so they should be listed as separate entries. They could contain different content, because you could have an index.php or index.htm that shows the same content as "/" depending on the entries in the DirectoryIndex settings.

If both of them work and directly serve content, then you have a Duplicate Content issue.

You should link to "/" and you should set up a 301 redirect so that requests for "/index.html" are redirected to "/".

rep82

11:33 am on Aug 14, 2008 (gmt 0)

10+ Year Member



Thanks g1smd and jdMorgan.

This has opened up a whole new can of worms for me. You are both right as all my internal linking is done as dir/index.html as opposed to just dir/. As it turns out, this is exactly why I'm getting duplicate entries. My site has been online since 1998 and I can't believe this issue is just coming up now as I've always done my internal linking this way.

I'm contemplating making changes e.g., removing .html on all directory links. I'm a little afraid of what this change will do to my serp's (specifically on "G"). Do either of you think this change will have an adverse affect on them? Could you give me a few pointers on the best way to make these changes without screwing up my serp's? e.g., should I just strip the .html, or also do a 301 for the links I change?

Also, on another note, my site map is set up the same way with dir/index.html. Should I also change it to just dir/? Sorry to sound like such an idiot but this has really thrown me for loop.

Thanks again for the help.

jdMorgan

3:38 pm on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I'd recommend that you "go slow," since we know nothing of your site, whether you depend on it for your living, etc. Really, only you can decide such matters. If you do change your URLs, then you should go slowly -- I'd recommend doing only a few pages at a time.

Do a search here on WebmasterWorld (link at top of every page) for subjects such as "canonical domain", "canonical URL", "change /index to /" (or root), "duplicate content", and the similar phrases you'll find in these threads -- There are many threads on these subjects here, some of them very recent [webmasterworld.com], but older ones of value as well. The Library for each forum can be of help as well. Some forums also have a "hot topics" thread pinned at the top of their individual thread lists, and their Forum Charters may contain links to useful references.

Reviewing these resources will help you get up to speed on these issues, and allow an informed decision.

That said, if this were my site, I would start by changing "/xyz/index.html" links to "/xyz/" on a small percentage (say 10 to 20%) of your lowest-level pages, and adding specific redirects for each, working up toward the top of your site's structure. The idea is that changing the lower-level page URLs won't "hurt" as much if they lose ranking temporarily.

Once the URLs for the lowest-level pages have all been changed, re-indexed, and return to normal ranking, then work up to the next level. By the time you get to the top-level pages, the lower-level pages will then provide a solid linking foundation to "support" the changes on the higher-level page URLs and the home page.

After all URLs have been updated, you can replace the (probably numerous) individual and/or group redirect directives with a single site-wide index-to-slash redirect.

It is critical that these changes be implemented correctly. When creating 301 redirects, install and use the "Live HTTP Headers" add-on for Firefox/Mozilla to verify that your redirects return a proper 301-Moved Permanently redirect HTTP response header, and that any "error" in a requested URL is "corrected" with a single 301 redirect to the proper URL.

For example, if redirecting "www.example.com/xyz/index.html" to "www.example.com/xyz/", then a request for "example.com/xyz/index.html" (no 'www') should also be redirected to "www.example.com/xyz/" by the same single redirect -- correcting both the domain and the URL-path at the same time. Not only must your individual redirect directives be coded correctly, but they must be in the correct order to make this happen. Test, test, test, and then test again... :)

Jim

rep82

9:15 pm on Aug 14, 2008 (gmt 0)

10+ Year Member



Thanks Jim -

Your advice is very sound. I've already started to scour WW for the threads you mentioned. I think I will take a long term approach to this as my traffic/income is great right now and gathering all the info I need from WW threads will take time. Bottom up is a great idea, especially since I can track results in serps on some of those pages.

P.s. Any thoughts on what to do with my sitemap links? Should I try the same bottom to top fix on that too?

Thanks again for your help!

g1smd

10:02 pm on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should link to "/" and you should set up a 301 redirect so that requests for "/index.html" are redirected to "/".

Likewise /folder/index.html should redirect to /folder/ too.

In any case, most search engines seem to mostly like to list the shorter of the two versions of the URL.

jdMorgan

10:07 pm on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Rewrites, redirects, sitemaps, and robots.txt should all be kept "in sync" throughout the project.

Jim

rep82

5:07 am on Aug 15, 2008 (gmt 0)

10+ Year Member



Thanks guys for the expert advice -- much appreciated indeed. I'm positive I can take it from here.

P.s. If I ever get to the point where I know 50% of this stuff it'll be a miracle. lol

g1smd

10:11 pm on Aug 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'll be able to teach me, 'cus I doubt I know more than 20% of it. :-)