Forum Moderators: Robert Charlton & goodroi
Than someone pointed me to this module: 404-SEF for Mambo.
It produces Search Engine Friendly urls.
Now the same page is re-named by this module to:
www.mydomain.nl/site/weblinks/pagename.html
When I do a site check in Google I see both urls in the index.
These url represent the same actual page with the same content.
Will google consider this to be duplicate content or will it remove the old
www.mydomain.nl/site/index.php?option=com_weblinks&catid=18&Itemid=22
in time?
If it doesn't... what should I do?
thanks.
You might want to stop spiders from indexing those non-SEF URLs via robots.txt:
Disallow: /index.php
Disallow: /index2.php?option=com_content&task=view
I use these commands, the first to stop the problem you're talking about, the second to stop indexing of pdf files that mambo can generate too.
The first command is especially important on your home page... As with all advice, you know your website better than we do make sure you understand what will happpen before making changes.
I also used the following in my .htacess file to tell the SEs that these particular pages no longer exist. Just something to think about. Jim over at the apache forum can help you with those problems.
# Getting Rid of PDF Files
RewriteCond %{QUERY_STRING} ^option=com_content&do_pdf=
RewriteRule ^index2\.php$ - [G]
#
# Getting Rid of print preview Files
RewriteCond %{QUERY_STRING} ^option=content&task=view&id=
RewriteRule ^index2\.php$ - [G]
Billys, if I understand you correctly it would be wise to find all old search engine unfriendly urls google has indexed en put these in the robots.txt file?
The index.php also? This is still the url for my homepage. The 404SEF module did not rename this to anything else. When I view the sourcecode of the index.php file in my browser all urls in there are the Search Engine Friendly ones, so that should be ok I think?
thanks
What I'm trying to say is that mambo produces non-sef urls that start with a similar pattern. From W3:
The "Disallow" field specifies a partial URI that is not to be visited. This can be a full path, or a partial path; any URI that starts with this value will not be retrieved.
Make sure you understand what the impact of a change will be before you make it!
>>The index.php also? This is still the url for my homepage. The 404SEF module did not rename this to anything else.
I'm kinda surprised at this response. Usually it's set up to redirect to something like [foo.tld...]
If that's not the case, you'd want to KEEP:
/index.php
Or you'd prevent the indexing of your home page.
>>When I view the sourcecode of the index.php file in my browser all urls in there are the Search Engine Friendly ones, so that should be ok I think?
Do you publish an RSS feed? If so, the unfriendly URLS can leak from there too.
What you can do about your /index.php url is to go into the 404sef component and edit the sef url to what you want it to be.
The problem I am having is that I am getting the following duplicate pages from this component:
category/widget.html
category/widget-2.html
category/widget-3.html
category/widget-4.html
Anyone else seeing this and have a solution?
Think this would raise a red flag with SE's?
I do know about a problem with pages listed in a section or category. For example if you had 10 pages in a section before going to a new page:
section/whatever/10/0
Is the same as
section/whatever/
I use my .htacess file to deal with these...
# Getting Rid of Duplicate Zero Pages
RedirectMatch 301 ^/Section/Whatever/10/0/$ [sitename.tld...]
If joomla / mambo (the core) produces the "extra" page name, SEF Advance WILL try to produce another page.
I use SEF Advance with Joomla 1.0.8. ... I would be worried if your not returning 404s for non-existent pages.
I also use various incarnations of Jooma! with SEF Advance, and on one site (running 1.0.5) its set up to automatically redirect any calls on non-existent pages to the homepage.
This was a feature of Joomla up to 1.0.5, which IIRC, was changed from 1.0.6 onwards to give a 404 page for non-existents.
I think I actually preferred the old method, as at least visitors were taken to a valid part of the site instead of getting a 404.
I set up a custom 404 page to help redirect traffic:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>404 Not Found</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta http-equiv="imagetoolbar" content="no" />
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="7"><b>Error
404</b></font><br />
<br />
<font face="Verdana, Arial, Helvetica, sans-serif"><b>The page you are trying to
access does not exist on our server.</b></font><br />
<br />
<br />
<br />
<br />
<font face="Verdana, Arial, Helvetica, sans-serif"><b>Click here to go
to our Home Page: <a href="http://www.mywebsite.tld/">MyWsbsite.tld</a></b></font></div>
</body>
</html>
texasville, I wholeheartedly agree with you. It's frustrating to have to work around google's algo.
I think the duplicate pages come up if you have a page on 2 different menus. mainmenu might create a url pointing to category/page.html and then if that page is also on usermenu, it will create another url like category/page-2.html
Both will be the same page, but it looks like there are 2 of them.
I was just in the process of seeing if it was going to be a problem with google, when they started updating a few days ago and removed 1/2 of my pages. now, 4 days later, it's starting to get re-indexed. My adsense and traffic took a nosedive with this last update.
Switched to Xaneon Extensions - worked for me, and unique URLs (tho need care with all the possible internal links in Mambo)
Now, with site running Joomla. using OpenSEF, which developed from Xaneon.