Welcome to WebmasterWorld Guest from 3.233.226.151

Forum Moderators: phranque

Message Too Old, No Replies

Usernames in the URL, compliance with Google's PII policy

     
7:03 am on May 18, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1201
votes: 119


I used to plug in user's usernames in the URL like so:

<?php
$u = urlencode($username);
echo "www.example.com/$u";
?>


It was never quite beautiful since I have usernames with spaces and/or special characters in them, but it worked.

Then a few years ago I had a warning from Google that I was violating their new PII policy (you can't show any personally identifying information *). In a near panic to get my entire site modified to be compliant, I didn't quite realize that the only problem they had was when someone used an email address as their username. So I COULD have just replaced the @ with a - or something and been fine, but I didn't know that until a couple of weeks ago.

So now I'm back to where I used to be... I'd like to use the username in the URL again, but I need for it to be both pretty and compliant with Google's PII policy.

My first thought was to replace any space or special character with a . (dot), then on the page where I do a MySQL query based on the username I could simply use LIKE in the query:

$user = str_replace('.', '_', $_GET['user']);

$query = sprintf("SELECT username FROM users WHERE username LIKE '%s' LIMIT 1",
mysqli_real_escape_string($dbh, $user));


But I might have a user with a special character at the beginning or end of their name, and since this would look weird:

example.com/.csdude./

I would want to trim the . from both ends.

Both of these lead to more potential problems, though. I could realistically have these completely different usernames registered:

csdude
~csdude~
c.s. dude!

But the system above would think they're all the same.

Soooo, my next thought was that I could create a MySQL table with all registered usernames (about 500,000) in one column and the encoded username in another column, then write a script to manually replace the special characters in each one and insert it in to column B... and if there's already a match, add or increment a number at the end; eg:

csdude  | csdude
~csdude~ | csdude1
c.s.dude! | csdude2


This is starting to become a bit more complicated than I wanted, though, and has a lot more potential for errors. So before I spend the next week or two writing the programs to do all of this (!), can you guys suggest any method that might be easier that I've overlooked?


* Adsense PII policy: [support.google.com...]
7:16 am on May 18, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Mar 25, 2018
posts:500
votes: 101


Instead of the user name, I would use the user id, your database certainly associated a single number ID to each row/user. I doubt that having the user name in the URL has SEO benefits, and the advantage of using the user id number is that, if a user name is changed, the URL remains the same. Just a thought.

And regarding to the EU GDPR, this would avoid issue, if someone used his real name , or if his user name is in a way considered a personal information (like if the user name is famous, or unique).
7:30 am on May 18, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1201
votes: 119


Well, the logic behind a readable, easy-to-remember username in the address field is to encourage users to share it on social media. For example,

www.example.com/travis/

would take you to a list of all of your classified ads listed. You could easily share that or tell people, whereas

www.example.com/456879832/

is impossible to remember.

My site also includes restaurant listings and a business directory, so I'm trying to make it easy for those businesses to share their link, too.
7:42 am on May 18, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Mar 25, 2018
posts:500
votes: 101


Yes, I see.

So to answer your initial message, yes, you should add some kind of filtering. If the only problem being email addresses used as user name (yes, people still do this) , remove the @ and . ... more generally removing all non alphanumerical characters might be a good idea, to handle cases you may not have thought about. I would also turn all special alphabetic characters into a ... I don't know how to say... like there can be characters with accents , I would convert => e for example. because if you "encode" the username , you can end with &xxx or %xxx characters
11:27 pm on May 18, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10556
votes: 1115


When you set up an account for the user you have an internal UNIQUE ID. You have their email address to validate. At the same time you can REQUEST a DISPLAY name for the URL from the user (watch them abuse that!). That solves the immediate problem in that your backend verifies the displayname is equally unique for the site.

The idea sounds nice, but long and short, this reveals PII, too and for that GDPR requires getting permission first. Some users might balk.

If you automagically strip their email address for the id you can bet someone will reverse-engineer that and here come the spammers and a horde of unhappy users.

I'd go with a generated id, unique to each classified (not the individual) and provide an easy cut and paste for the user to share around. YMMV
12:56 am on May 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1201
votes: 119


The thing is, I can tell a HUGE drop in the number of times these links have been shared on social media after I changed to be compliant. It could be a coincidence with the growth of mobile usage, I'm not sure, but I won't know until I try. And since I'm rebuilding, anyway, now's the time to do it, I guess.

Good point on the spammers, though. I can always just strip the end of an email address, turning example@gmail.com in to example. And I guess I'll need to write a program to let the user change their ID if they want, I'm just not sure what it would be called that the general public would understand.