Welcome to WebmasterWorld Guest from 54.226.55.151

Forum Moderators: coopster & jatar k

Securely Redirecting

     
12:12 pm on Sep 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:885
votes: 1


Quick background here, I moving a forum from one domain to another. I know enough PHP to be dangerous... to myself. I will only be redirecting the topics and forums related to the main topic of the forum. I'm also taking the opportunity to rid myself of this useless .html rewrite I added many years ago.

I'm looking for some advice on the security of the script I wrote and any other advice you might have.

My .htacces file will be redirecting the other URL's and forums i want redirected. At the end of the .htaccess file I've added this as catch all:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /myredirect.php?url=$1 [NC,L]



The forum URL's have this pattern:
/forum-123.html OR /forum-123-4567.html
123 is the forum ID that can be 1 to 3 numeric digits and 4567 is the start parameter that could be up to 4 numeric digits.

Similarly the topic URL's have this pattern:
/about123.html OR /about123-4567.html
123 is the topic ID that can be 1 to 5 numeric digits and 4567 is the start parameter that could be up to 4 numeric digits.

Here is the script I wrote.

<?php
$url= false;
$topic_id= false;
$forum_id= false;
$start= false;

if (isset($_GET['url']))
{

preg_match('/(about|forum-)([0-9]{1,5})((?:-)([0-9]{1,4}))?.html/', $_GET['url'], $matches);

//see if we matched 'about' or 'forum-'
if (isset($matches[1]) && $matches[1] == 'forum-' || $matches[1] == 'about')
{
//Check for the start parameter which could be present in the topics or forums
if (isset($matches[4]))
{
$start = (int)$matches[4];
}

//If it's forum URL we only need to generate a link for output because any forums we want redirected have already been redirected
if ($matches[1] == 'forum-')
{
$forum_id = (int)$matches[2];
$url = 'https://newdoamin.com/forum/viewforum.php?f=' . $forum_id . ($start ? '&start=' . $start : '');
}

if ($matches[1] == 'about')
{

$topic_id = (int)$matches[2];

//Get the forum ID so we know which forum the topic is in
$link = mysqli_connect( 'localhost', 'readonly_mysqluser', 'password', 'db_name');
$sql = "SELECT forum_id FROM phpbb_topics WHERE topic_id = $topic_id";

if ($result = mysqli_query($link, $sql))
{

while ($row=mysqli_fetch_row($result))
{
$forum_id = $row[0];
$url = 'https://newdomain.com/forum/viewtopic.php?f=' . $forum_id . '&t=' . $topic_id . ($start ? '&start=' . $start : '');
}
mysqli_free_result($result);

}

//Only topics in these forums will be redirected
$valid_forums = array('1', '2', '3');

if (in_array($forum_id, $valid_forums))
{

header("Location: $url",TRUE,302);
exit;

}
}
}

//At this point $url should be false or hold the URL for a forum or topic that wiil not be redirected.

header("HTTP/1.0 404 Not Found");
//Switch to 410 after a few days once the logs are checked for anything that was missed
//header("HTTP/1.0 410 Gone");

}
?>

Output some HTML ....If $url is not false
The page you requested in no longer avaiable here but can be found here:

echo $url

-or-

page not found, blah blah blah


As best i can tell this is working as I expect it to, my biggest concern is if I have made any mistakes with the security of it.
2:13 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1683
votes: 239


I would probably just stick with the .html rewrite since /viewtopic.php?etc isn't any friendlier, and I would do a 301 redirect, but in terms of security I don't see any risks here. It's always good practice to escape and surround with single quotes any data you pass to MySQL, but I think you're already making sure that the forum IDs, topic IDs and start positions are numerical, so that shouldn't be a problem (though I haven't looked at the regex in detail, and would probably opt for a strpos() type of search myself because it's faster and less prone to errors than regexes).

if (isset($matches[1]) && $matches[1] == 'forum-' || $matches[1] == 'about')

Minor thing, but this will probably throw a warning if $matches[1] isn't set. You need to separate the && (AND) from the || (OR) like so:

if (isset($matches[1]) && ($matches[1] == 'forum-' || $matches[1] == 'about'))

Note the extra parentheses.

Also, in phpBB it's optional to pass the forum ID in the URL of a thread, so you could technically do this with .htaccess alone.
4:20 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:885
votes: 1


Thanks Rob.

I would probably just stick with the .html rewrite since /viewtopic.php?etc isn't any friendlier


I realize that but it's maintenance issue that I'll be glad to be rid of and I'm also upgrading the forum software to latest version. If if i wanted to keep them I'd have to do a lot modifications from scratch. The rewrite was installed back in '04 or something like that when "anything.html" was the rage before decent friendly URL's were being used widely. I've pretty much kicked myself in the ass since because I've had to maintain them all these years through multiple updates/upgrades. If there was official support from phpBB for decent friendly URL's then I'd use them but there isn't. Because it's such an extensive modification I'll be sticking with the stock URL's until it's supported in the core which could be forever based on their attitudes towards them.

and I would do a 301 redirect,


Oops, thanks for spotting that and I already added the additional parentheses after posting this.

Also, in phpBB it's optional to pass the forum ID in the URL of a thread, so you could technically do this with .htaccess alone.


I'm also aware of that, you don't even need the right forum ID. The only thing that is used for is "Who is online" for that forum.

Two things though, I'm only redirecting topics in specific forums related to the main topic so I need to grab the forum ID anyway. The other forums are politics, chit chat and whatever. A lot of those topics have been indexed in the past but they are currently denied though phpBB permissions. The new domain will never have those topics indexed. I don't want to be redirecting the bots to unauthorized page. I'll have a valid URL for those URL's I'm not redirecting and output some HTML with a link for someone who has it bookmarked or they followed a link from elsewhere.

Secondly most of the URL's phpBB generates have the -f parameter and those that don't I've already modified so they do, in particular the canonical URL. I've also modified viewtopic to redirect if the -f parameter is excluded or the wrong one. You can end up with wrong one if it's indexed/bookmarked and the topic is moved afterwards.

I can care less about the f parameter but it's either include it everywhere or don't. There may be one thing it's useful for, advertisers can use it to target specific forums.

.
6:44 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1683
votes: 239


I realize that but it's maintenance issue that I'll be glad to be rid of and I'm also upgrading the forum software to latest version

Ugh, yes, that does bring back a few memories of a commercial, closed-source module I once had to rewrite my phpBB URLs. It did a good job at that but it also got in the way of just about every other customization I wanted.

I assume you'll also be escaping any HTML characters in $url when you echo it to the 404/410 page. Can't think of any other security issues.
7:54 pm on Sept 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14444
votes: 578


Minor point:
/(about|forum-)([0-9]{1,5})((?:-)([0-9]{1,4}))?.html/

What is the significance of
(?:-)
? Since it's a single character and it isn't optional and it's not to be captured, the parentheses would seem to be superfluous. (Oh and psst! You really need to escape the literal period in \.html, since a . alone would also match a digit.)

If I'm counting right,
[1] = about OR forum-
[2] = the first group of numbers
[3] = - plus the rest of the numbers
[4] = the second group of numbers, excluding the -
If the - element is always present, do you really need to capture [3] and [4] separately? The more parentheses, the more chances for blunders.
8:31 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1683
votes: 239


I was hoping you'd pop in for the regex ;-)
9:24 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:885
votes: 1


[3] = - plus the rest of the numbers

...If the - element is always present,


It's not always present and I had difficulties with this and was going to mention it. The minus sign and the group of number after it will appear together or not all. I'm testing for the minus sign to insure it's there(at least that's what I think I'm doing), if it's not there then there shouldn't be a second group of numbers. I do not need to capture it. I only need the numbers that follow it IF they are present. A first I was just capturing it with the numbers in one group and using str_replace to remove it.

(Oh and psst! You really need to escape the literal period in \.html, since a . alone would also match a digit.)


Thanks.
9:37 pm on Sept 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:885
votes: 1



Ugh, yes, that does bring back a few memories of a commercial, closed-source module


This one wasn't closed source. I can even remember the name of it or the author able2know, it goes back to phpBB2. There was an SEO mod for phpBB3.0 that did friendly URL's but I only used it to maintain my current URL's. I ended up having to modify the modification to make it work.... It's just an endless pit of work.

I assume you'll also be escaping any HTML characters in $url


Just finishing that up, thanks for the reminder.
12:12 am on Sept 25, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14444
votes: 578


The minus sign and the group of number after it will appear together or not all.
...
I do not need to capture it. I only need the numbers that follow it

Got it. The pattern then is
(about|forum-)([0-9]{1,5})(?:-([0-9]{1,4}))?\.html
where [3] = the second set of numbers, if present. Yes, the group will be captured even if it's inside a (?: blahblah) non-capture envelope.

Now, if I were writing the code I'd just say \d+ each time, for a total savings of fourteen bytes, but that's more about individual coding style. It's only crucial if you need to exclude URLs that have more than five or four numerals.
(about|forum-)(\d+)(?:-(\d+))?\.html
12:21 pm on Sept 25, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:885
votes: 1


Lucy, I tried that before and I must of had the matches screwed up leading to my confusion, Working now.

It's not critical but the limits on the numbers matches are there because they won't exceed them. The forum script will try and return a page even if the values for the start parameter are not legitimate. e.g. if the start parameter for a topic only goes up to 30 you could type in one million and it will return the last page. I don't know of any particular issues I have with this as far as indexing goes but I will address it when I have more time.