Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google and renamed urls by 404SEF module

Old and new SEF url... duplicate content issue?

         

drako

2:28 pm on Jun 14, 2006 (gmt 0)

10+ Year Member



I've build a site using Mambo open source. It got indexed by Google fine.
It had the usual url structure Mambo produces:
www.mydomain.nl/site/index.php?option=com_weblinks&catid=18&Itemid=22

Than someone pointed me to this module: 404-SEF for Mambo.
It produces Search Engine Friendly urls.

Now the same page is re-named by this module to:
www.mydomain.nl/site/weblinks/pagename.html

When I do a site check in Google I see both urls in the index.
These url represent the same actual page with the same content.
Will google consider this to be duplicate content or will it remove the old
www.mydomain.nl/site/index.php?option=com_weblinks&catid=18&Itemid=22
in time?

If it doesn't... what should I do?

thanks.

leadegroot

11:36 pm on Jun 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



if the bot can reach both pages it will index both pages.
I'm familiar with mambo or that module, but what is probably happening is that inside your site. the page are really coming from the old url.
This means you have a problem - if you redirect (301, htaccess) all the old urls you will probably make the pages stop working. :(

BillyS

12:35 am on Jun 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Watch out for duplicate content, mambo and joomla (which I use) will produce what seem like duplicate pages.

You might want to stop spiders from indexing those non-SEF URLs via robots.txt:

Disallow: /index.php
Disallow: /index2.php?option=com_content&task=view

I use these commands, the first to stop the problem you're talking about, the second to stop indexing of pdf files that mambo can generate too.

The first command is especially important on your home page... As with all advice, you know your website better than we do make sure you understand what will happpen before making changes.

I also used the following in my .htacess file to tell the SEs that these particular pages no longer exist. Just something to think about. Jim over at the apache forum can help you with those problems.

# Getting Rid of PDF Files
RewriteCond %{QUERY_STRING} ^option=com_content&do_pdf=
RewriteRule ^index2\.php$ - [G]
#
# Getting Rid of print preview Files
RewriteCond %{QUERY_STRING} ^option=content&task=view&id=
RewriteRule ^index2\.php$ - [G]

drako

9:06 am on Jun 15, 2006 (gmt 0)

10+ Year Member



Thanks Leadegroot and Billys.

Billys, if I understand you correctly it would be wise to find all old search engine unfriendly urls google has indexed en put these in the robots.txt file?

The index.php also? This is still the url for my homepage. The 404SEF module did not rename this to anything else. When I view the sourcecode of the index.php file in my browser all urls in there are the Search Engine Friendly ones, so that should be ok I think?

thanks

icedowl

12:53 pm on Jun 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Watch out for duplicate content, mambo and joomla (which I use) will produce what seem like duplicate pages.

I use Joomla and Mambo too. If you can live without it, unpublish the 'blog' module as it definitely creates duplicate pages.

BillyS

8:17 pm on Jun 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Billys, if I understand you correctly it would be wise to find all old search engine unfriendly urls google has indexed en put these in the robots.txt file?

What I'm trying to say is that mambo produces non-sef urls that start with a similar pattern. From W3:

The "Disallow" field specifies a partial URI that is not to be visited. This can be a full path, or a partial path; any URI that starts with this value will not be retrieved.

Make sure you understand what the impact of a change will be before you make it!

>>The index.php also? This is still the url for my homepage. The 404SEF module did not rename this to anything else.

I'm kinda surprised at this response. Usually it's set up to redirect to something like [foo.tld...]

If that's not the case, you'd want to KEEP:

/index.php

Or you'd prevent the indexing of your home page.

>>When I view the sourcecode of the index.php file in my browser all urls in there are the Search Engine Friendly ones, so that should be ok I think?

Do you publish an RSS feed? If so, the unfriendly URLS can leak from there too.

rden17

9:06 pm on Jun 16, 2006 (gmt 0)

10+ Year Member



I'm using 404SEF also and am running into a similar problem.

What you can do about your /index.php url is to go into the 404sef component and edit the sef url to what you want it to be.

The problem I am having is that I am getting the following duplicate pages from this component:

category/widget.html
category/widget-2.html
category/widget-3.html
category/widget-4.html

Anyone else seeing this and have a solution?

Think this would raise a red flag with SE's?

BillyS

11:57 pm on Jun 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



rden17 -

I use SEF Advance with Joomla 1.0.8. Emir offer great support and there were no reliable alternative when I started. So I cannot comment on your particular question. I would be worried if your not returning 404s for non-existent pages.

rden17

4:51 am on Jun 17, 2006 (gmt 0)

10+ Year Member



I'm returning 404's for non existent pages. It's just that joomla's structure has a bad habit of creating different pointers, and sef404 component gives each one a new url, making it look like duplicate content. Does SEF Advance do this at all? I think I'll try that one on my next joomla project.

BillyS

11:51 am on Jun 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure if I follow your question.

I do know about a problem with pages listed in a section or category. For example if you had 10 pages in a section before going to a new page:

section/whatever/10/0

Is the same as

section/whatever/

I use my .htacess file to deal with these...

# Getting Rid of Duplicate Zero Pages
RedirectMatch 301 ^/Section/Whatever/10/0/$ [sitename.tld...]

If joomla / mambo (the core) produces the "extra" page name, SEF Advance WILL try to produce another page.

malachite

12:33 pm on Jun 17, 2006 (gmt 0)

10+ Year Member



Billy_S said
I use SEF Advance with Joomla 1.0.8. ... I would be worried if your not returning 404s for non-existent pages.

I also use various incarnations of Jooma! with SEF Advance, and on one site (running 1.0.5) its set up to automatically redirect any calls on non-existent pages to the homepage.

This was a feature of Joomla up to 1.0.5, which IIRC, was changed from 1.0.6 onwards to give a 404 page for non-existents.

I think I actually preferred the old method, as at least visitors were taken to a valid part of the site instead of getting a 404.

texasville

2:32 pm on Jun 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Once again we are jumping thru hoops to satisfy google because they can't create an algo that doesn't penalize for the dup content on a single site.
It bothers me that google has so much power that instead of them having to fix their bugs, thousands of webmasters spend countless hours trying to work around THEIR problems.

BillyS

12:22 pm on Jun 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



texasville - I agree with what you're saying, but it's Google's playground right now.

I set up a custom 404 page to help redirect traffic:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>404 Not Found</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta http-equiv="imagetoolbar" content="no" />
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="7"><b>Error
404</b></font><br />
<br />
<font face="Verdana, Arial, Helvetica, sans-serif"><b>The page you are trying to
access does not exist on our server.</b></font><br />
<br />
<br />
<br />
<br />
<font face="Verdana, Arial, Helvetica, sans-serif"><b>Click here to go
to our Home Page: <a href="http://www.mywebsite.tld/">MyWsbsite.tld</a></b></font></div>
</body>
</html>

rden17

1:47 pm on Jun 18, 2006 (gmt 0)

10+ Year Member



404sef does a great job at handling the 404's. You can either customize a default 404, use any static content page as a 404, or just have it go to your main homepage.

texasville, I wholeheartedly agree with you. It's frustrating to have to work around google's algo.

I think the duplicate pages come up if you have a page on 2 different menus. mainmenu might create a url pointing to category/page.html and then if that page is also on usermenu, it will create another url like category/page-2.html

Both will be the same page, but it looks like there are 2 of them.

I was just in the process of seeing if it was going to be a problem with google, when they started updating a few days ago and removed 1/2 of my pages. now, 4 days later, it's starting to get re-indexed. My adsense and traffic took a nosedive with this last update.

docbird

10:34 pm on Jun 18, 2006 (gmt 0)

10+ Year Member



I used 404sef for short time; likewise had problems as produced duplicate URLs for same content; in large part a mambo issue (evidently annoying 404sef creator when he was asked about it!)

Switched to Xaneon Extensions - worked for me, and unique URLs (tho need care with all the possible internal links in Mambo)

Now, with site running Joomla. using OpenSEF, which developed from Xaneon.

ruip

7:27 am on Jun 19, 2006 (gmt 0)

10+ Year Member



I solve this problem removing default nav bar in core.