Welcome to WebmasterWorld Guest from 34.204.36.101

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

fix doubled (part of) url

     
10:05 am on Jul 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 2, 2006
posts:2240
votes: 8


Broken CM software that creates sitemaps automatically created URLs like this:

http://www.example.com/sub/http://www.example.com/sub/

/sub/ is where that CMS is located. The rest of the site is fine.

First, I fixed the sitemap, and then I went and did this:

RewriteRule ^sub/http\://www\.example\.com/sub/(.*) http://www.example.com/sub/$1 [R=301,L,NC]

The above does not work. Played a bit with it to ensure I did not miss some escape. Searched the web, still no luck.

1. Which rule can fix it?
2. I was surprised to see that the content was still served with doubled rot URL. How come?

Thanks
11:27 am on July 25, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


Apache internally collapses multiple slashes in the URL-path, which is why you are having problems trying to match the double slash in "http://www". As far as the URL-path that is matched by the RewriteRule pattern, there is only 1 slash. So, you could try a rule such as:


RewriteRule ^sub/http:/www\.example\.com/sub/(.*) http://www.example.com/sub/$1 [R=301,L]


No need to escape the colon (:) in the RewriteRule pattern, as it carries no special meaning here. I suspect the NC (NOCASE) flag is unnecessary also?

Note that this will need to go before any CMS routing directives.

If you needed to match the full URL as seen in the request - with multiple slashes - then you could use a RewriteCond directive and check against the %{REQUEST_URI} server variable.


2. I was surprised to see that the content was still served with doubled rot URL. How come?


This may just be to do with the way the CMS processes and routes the URLs? (Although arguably it should really reject it.) Otherwise, without some kind of front controller you'd probably expect a 403 or 404.
12:27 pm on July 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 2, 2006
posts:2240
votes: 8


Thanks.

Did not work. I did try changing it, but this thing keeps returning that double root url. It simply does not react to such directives. If I test it outside of this CMS (other parts of my site), I do get 404. It must be some sort of rewrite within CMS.
3:13 pm on July 25, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


Cache cleared etc? In what way did it not work? Did anything happen at all when you visit http://www.example.com/sub/http://www.example.com/sub/? If nothing (ie. no additional network traffic) then it kinda suggests something more fundamental (no .htaccess, no mod_rewrite, incorrect pattern...?), rather than the CMS taking over.


It simply does not react to such directives.


Is Apache .htaccess / mod_rewrite enabled? Are there any other directives in .htaccess?


...this thing keeps returning that double root url.


You mean the "double root" is appearing in internal links etc (and everywhere)?
5:36 pm on July 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 2, 2006
posts:2240
votes: 8


All is in place, but URL simply does not change. Internal links are fine, all started with the broken system that doubled the root url in the sitemap which got picked by Google. Plus, the way how CMS rewrites URLs is that no matter what you put into a url, it returns the same content (as long as the article parameters are there I guess). Quite wrong. The system uses canonical tags, so some believe that's enough. I don't like it, and G comes back with duplicated content in WMT. Actually, I can post the lines from CMS' .htaccess, it's not too, too long:

# always follow the symlinks
Options +FollowSymlinks -MultiViews -Indexes

# X-Frame-Options to prevent clickjacking
Header always append X-Frame-Options SAMEORIGIN

# if you want to use mod_rewrite, set this 'On'
RewriteEngine On

# the path to your phpMyFAQ installation
RewriteBase /sub/

# Maintenance Section - Uncomment the lines below according to your need
# Write below your client IP address (e.g.: 127.0.0.1)
# if you need to keep your web access during maintenance
#RewriteCond %{REMOTE_ADDR} !^127.0.0.1$
# Choose your way of closing the access to PMF:
# a. you can simply forbid access (HTTP 403 Error)
#RewriteRule ^(.*)$ underMaintenance.html [F,L]
# b. return the user an explanation
#RewriteRule ^(.*)$ underMaintenance.html [L]

# show all categories
RewriteCond %{REQUEST_URI} /showcat\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=show [L,QSA]

# the search page
RewriteCond %{REQUEST_URI} /search\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=search [L,QSA]

# opensearch
RewriteCond %{REQUEST_URI} /opensearch\.html$ [NC]
RewriteRule ^(.*)$ opensearch.php [L,QSA]

# the add content page
RewriteCond %{REQUEST_URI} /addcontent\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=add [L,QSA]

# the ask question page
RewriteCond %{REQUEST_URI} /ask\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=ask [L,QSA]

# the open questions page
RewriteCond %{REQUEST_URI} /open\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=open [L,QSA]

# the help page
RewriteCond %{REQUEST_URI} /help\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=help [L,QSA]

# the contact page
RewriteCond %{REQUEST_URI} /contact\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=contact [L,QSA]

# the glossary page
RewriteCond %{REQUEST_URI} /glossary\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=glossary [L,QSA]

# the overview page
RewriteCond %{REQUEST_URI} /overview\.html$ [NC]
RewriteRule ^(.*)$ index.php?action=overview [L,QSA]

# a page with a record (backward compatibility)
# * http://[...]/1_1_en.html
RewriteCond %{REQUEST_URI} ([0-9]+)_([0-9]+)_([a-z\-]+)\.html$ [NC]
RewriteRule ^(.*)_(.*)_(.*)\.html$ index.php?action=artikel&cat=$1&id=$2&artlang=$3 [L,QSA]

# a category page with page count (backward compatibility)
# * http://[...]/category1_1.html
RewriteCond %{REQUEST_URI} category([0-9]+)_([0-9]+)\.html$ [NC]
RewriteRule ^category(.*)_(.*)\.html$ index.php?action=show&cat=$1&seite=$2 [L,QSA]

# a category page (backward compatibility)
# * http://[...]/category1.html
RewriteCond %{REQUEST_URI} category([0-9]+)\.html$ [NC]
RewriteRule ^category(.*)\.html$ index.php?action=show&cat=$1 [L,QSA]

# start page
RewriteRule ^index.html$ index.php [PT]

# sitemap (backward compatibility)
RewriteCond %{REQUEST_URI} sitemap-([a-zA-Z0-9]*)_([a-z\-]+)\.html$ [NC]
RewriteRule ^sitemap-(.*)_(.*)\.html$ index.php?action=sitemap&letter=$1&lang=$2 [L,QSA]

# a solution id page
RewriteCond %{REQUEST_URI} solution_id_([0-9]+)\.html$ [NC]
RewriteRule ^solution_id_(.*)\.html$ index.php?solution_id=$1 [L,QSA]

# PMF faq record page
# * http://[...]/content/1/1/<LANGUAGE CODE>/<FAQ TOPIC>.htm
# * http://[...]/content/1/1/<LANGUAGE CODE>/<FAQ TOPIC>.html
RewriteRule content/([0-9]+)/([0-9]+)/([a-z\-]+)/(.+)\.htm(l?)$ index.php?action=artikel&cat=$1&id=$2&artlang=$3 [L,QSA]

# PMF category page with page count
# * http://[...]/category/1/<PAGE NUMBER/<CATEGORY NAME>.htm
# * http://[...]/category/1/<PAGE NUMBER/<CATEGORY NAME>.html
RewriteRule category/([0-9]+)/([0-9]+)/(.+)\.htm(l?)$ index.php?action=show&cat=$1&seite=$2 [L,QSA]

# PMF category page
# * http://[...]/category/1/<CATEGORY NAME>.htm
# * http://[...]/category/1/<CATEGORY NAME>.html
RewriteRule category/([0-9]+)/(.+)\.htm(l?)$ index.php?action=show&cat=$1 [L,QSA]

# PMF news page
# * http://[...]/news/<ID>/<LANGUAGE CODE>/<HEADER>.htm
# * http://[...]/news/<ID>/<LANGUAGE CODE>/<HEADER>.html
RewriteRule news/([0-9]+)/([a-z\-]+)/(.+)\.htm(l?)$ index.php?action=news&newsid=$1&newslang=$2 [L,QSA]

# PMF sitemap
# * http://[...]/sitemap/<LETTER>/<LANGUAGE CODE>.htm
# * http://[...]/sitemap/<LETTER>/<LANGUAGE CODE>.html
RewriteRule sitemap/([^\/]+)/([a-z\-]+)\.htm(l?)$ index.php?action=sitemap&letter=$1&lang=$2 [L,QSA]

# PMF Google sitemap
# * http://[...]/sitemap.xml
# * http://[...]/sitemap.gz
# * http://[...]/sitemap.xml.gz
#RewriteRule sitemap.xml$ sitemap.xml.php [L]
RewriteRule sitemap.gz$ sitemap.xml.php?gz=1 [L]
RewriteRule sitemap.xml.gz$ sitemap.xml.php?gz=1 [L]

# PMF tags page with page count
# * http://[...]/tags/<ID>/<PAGE NUMBER>/<HEADER>.htm
RewriteRule tags/([0-9]+)/([0-9]+)/(.+)\.htm(l?)$ index.php?action=search&tagging_id=$1&seite=$2 [L,QSA]

# PMF tags page
# * http://[...]/tags/<ID>/<HEADER>.htm
RewriteRule tags/([0-9]+)/([^\/]+)\.htm(l?)$ index.php?action=search&tagging_id=$1 [L,QSA]

# REST/JSON API
# * http://[...]/api/<ACTION>/<LANGUAGE CODE>/<...>
RewriteRule api/getVersion api.php?action=getVersion [L,QSA]
RewriteRule api/getApiVersion api.php?action=getApiVersion [L,QSA]
RewriteRule api/getCount api.php?action=getCount [L,QSA]
RewriteRule api/getDefaultLanguage api.php?action=getDefaultLanguage [L,QSA]
RewriteRule api/search/([a-z\-]+)/([a-z\-]+)$ api.php?action=search&lang=$1&q=$2 [L,QSA]
RewriteRule api/getCategories/([a-z\-]+) api.php?action=getCategories&lang=$1 [L,QSA]
RewriteRule api/getFaqs/([a-z\-]+)/([0-9]+) api.php?action=getFaqs&lang=$1&categoryId=$2 [L,QSA]
RewriteRule api/getFaq/([a-z\-]+)/([0-9]+) api.php?action=getFaq&lang=$1&recordId=$2 [L,QSA]
RewriteRule api/getAllFaqs/([a-z\-]+) api.php?action=getAllFaqs&lang=$1 [L,QSA]
RewriteRule api/getFaqAsPdf/([a-z\-]+)/([0-9]+)/([0-9]+) api.php?action=getFaqAsPdf&lang=$1&categoryId=$2&recordId=$3 [L,QSA]
RewriteRule api/getAttachmentsFromFaq/([a-z\-]+)/([0-9]+) api.php?action=getAttachmentsFromFaq&lang=$1&recordId=$2 [L,QSA]
RewriteRule api/getPopular/([a-z\-]+) api.php?action=getPopular&lang=$1 [L,QSA]
RewriteRule api/getLatest/([a-z\-]+) api.php?action=getLatest&lang=$1 [L,QSA]
RewriteRule api/getNews/([a-z\-]+) api.php?action=getNews&lang=$1 [L,QSA]
RewriteRule api/getPopularSearches/([a-z\-]+) api.php?action=getPopularSearches&lang=$1 [L,QSA]
RewriteRule api/getPopularTags api.php?action=getPopularTags [L,QSA]
RewriteRule api/getFAQsByTag/([a-z\-]+)/([0-9]+) api.php?action=getFAQsByTag&lang=$1&tagId=$2 [L,QSA]


Of all of the above, I believe that this is the line that rewrites the URL of an article:

RewriteRule content/([0-9]+)/([0-9]+)/([a-z\-]+)/(.+)\.htm(l?)$ index.php?action=artikel&cat=$1&id=$2&artlang=$3 [L,QSA]

Thanks
7:55 pm on July 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 2, 2006
posts:2240
votes: 8


Just wanted to conclude that the trouble was on the programmatic side, being fixed. No point of trying it via .htaccess until the root cause is corrected.
4:17 pm on July 26, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24



the trouble was on the programmatic side, being fixed.


Understood. Just a curiosity... where is the above .htaccess file located? Is it in the /sub folder? Or the document root (parent folder)? Do you have any other .htaccess files in parent folders? To which .htaccess file were you adding the above redirect?

Just twigged... If the .htaccess file, to which you were adding this redirect, is also in the /sub folder (yes you did say the CMS was in the /sub folder), then the RewriteRule pattern should not start "sub/" (since the directory-prefix is removed from the URL-path that is matched). For example:


# When .htaccess file is in the /sub folder...
RewriteRule ^http:/www\.example\.com/sub/(.*) http://www.example.com/sub/$1 [R=301,L]


If you have multiple .htaccess files along the URL/directory path then you could have problems with mod_rewrite directives being ignored since mod_rewrite directives are not inherited by default.
5:41 pm on July 26, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 2, 2006
posts:2240
votes: 8


Thanks very much for looking further into this.

I tried it and it did not work, but not because of wrong code, but because I was putting it too "low."

So I tried changing it, and finally moved it right below the ON directive, and it worked.

But here is an interesting part, for me at least. This is what I first used to make it work:

RewriteRule www\.example\.com/sub/(.*) http://www.example.com/sub/$1 [R=301,L]

Then I tried your solution, and that worked as well. It would work at the first place, if I put it closer to the top of .htaccess in /sub/.

So, while I get the same results with either code, is there any reason why I would use one over another?

Thanks
9:51 pm on July 26, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24



...but because I was putting it too "low."


Yes, the order of the directives is important. Generally, external redirects should always come before internal rewrites (which are what all the CMS/routing directives are).


...is there any reason why I would use one over another?


In terms of it "working" it probably doesn't matter... all URLs that contain "www.example.com" will match "http://www.example.com" (and I assume there are no instances of "www.example.com" in the path part without the "http://" prefix?). However, your rule is "potentially" slower - but this is likely to be negligible and make no real world difference.

The difference between your initial solution and mine is that yours is less specific... it matches "www.example.com" anywhere in the URL, whereas mine matches "http://www.example.com" at the start of the URL. In order to match something anywhere in the URL, the regex engine will keep searching until it's exhausted all possibilities and only then will it fail (computationally more expensive). Whereas, in order to match something exactly (ie. it must start with a specific string) then it can start testing left to right and fail immediately when there is a mismatch. It can fail quicker. So, for normal (well-formed) URLs it will be "quicker".

As a general rule, you should go for the most specific pattern that gets the job done - it's safer (won't match more than you expect) and often means less work for the regex engine (as it fails sooner).
2:25 am on July 27, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15652
votes: 800


Apache internally collapses multiple slashes in the URL-path, which is why you are having problems trying to match the double slash in "http://www".

Addressing this point only:

Double slashes in the body of a RewriteRule are ignored. But you can say // in a Condition and it will be recognized. (I know this from direct personal experience. Took me months to locate the mistake that was causing search engines to request directory//subdir like that.)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members