Welcome to WebmasterWorld Guest from 18.104.22.168
My customer wanted to organize a photo contest. Users upload their photo, browse photos, vote for those they like. Most votes = 1st place. Less votes - lower place. Voting is anonymous.
As always, it all looked simple in the very beginning.
Before setting off with coding, you'll probably think for a while; that's what i also tried doing and, well, "try" in general.
The main question was - 'what should be considered a vote?' (which changed into 'what should be considered a "valid" vote' later on).
From my point of view (and state of 'knowledge' at the time), there were 2 ways to address the issue.
1) Ip-fixed voting
'One ip may vote once per day' (which consequently means 'for 1 photo')
You would implement this approach in the following way.
You have a 'votes' table. Before inserting an entry into it, you check that there is no entry "date-ip" inside it. If there is an entry - you turn down the attempt.
The drawback of this approach was, that all 'employees', sitting behind a corporate router with NAT lose the individual ability to vote. The entire office can vote for 1 photo once per day.
I don't know how it is in other countries, but in Russia, you wouldn't really want to lose such a big audience. Thus, we decided to avoid this approach.
2) Cookie/Session-fixed voting.
It might be wrong to combine the 2 things into 1, but in many books i've read, they are distinguished, but in nature - they are the same. When you start a session for a browser, you actually set a cookie on the user's machine (probably with a SID or PHPSESSID or something like it).
This approach is a bit more complex. You set a session cookie for the user. Once a vote has been made, you do something like:
$_SESSION['voted'] = true;
you use this to prevent the user from voting again within the current session)
setcookie('voted', true, mktime(24, 0, 0) [...] );
you use this to keep a 'returning user' from voting again.
The drawback here is, all you need to do to vote again is reset the cookies.
So, in the end, the 1st approach meant you really cut up your audience, the 2nd - you are expecting really naive users.
But, here's the best part.
The first wave of attack were 'brutalists' - people turning off cookies and hitting the voting page fast (it lays in the open and had a very misleading name 'vote.php?id=xx', where 'xx' is the photo number O-).
Brutalists were kicked a bit off by placing 2 checks. The idea was that, first, a voting page should not be hit directly. You put a check, that some sort of session variable had been defined previously; the voting page is supposed to 'intercept' a previously defined session and not 'start a new one', although it's sessions_start() all the same. The second check was on number of 'pagehits'. Voting is only possible after at least 1 page (photo-browsing page) had been hit. Now, in order to vote, you first had to land on some page (not necessarily the photo browsing page), get a session, and then you could vote.
Then came the "slavers". They did something wicked, but something every web developer should experience once (i guess) and expect further on.
Forums, blogs - most of them don't have restrictions on using the <img src="external_url"> and <a href=""> tags. Well, guess what?
They place 2 images referencing your site with the photo contest in their profile, place for avatar or in their post. Then they start posting.
The first images gets a session and cookie set to every browser looking at any page, where their profile/post appears 'somehow' (and does it in a quiet way - i use firefox to debug 'things', it didn't even ask me about setting a 'foreign' cookie, but it actually did). The second, third, fourth, etc. image submits the vote O-).
The math is somethig like this:
Entries with references * Number of references inside the entry * Page hits (slaves) = votes
You look into the DB and see, that within the last few hours 2 or 3 photos have gained quite a number of votes from lots of different ips.
The conclusion you make after this, is 'having something like vote.php' out there in the open and so clearly defined is a bad idea (use front controller frameworks, read more, think more - then you might get good ideas).
So, that's how i got stuck with a number of questions.
- Hide vote.php? Have it move around all over the project? (thus, breaking all links like vote.php?id=xx)
- Obfuscate it? (add some sort of additional params so as to have links, that are unique in each voting case? it seems this would also break such linking attempts)
- Try moving it all into flash? The photo browsing is done via a flash app that makes requests to an AMFPHP gateway. Perhaps, voting could also be welded into it.
- Or was the idea of not restricting voting to ips very wrong from the very beginning and there is no way to create valid anonymous voting?
- Is web developing really something that i should be involved with? O-)
All opinions on any issues are appreciated.
I think I like the obfuscate idea from your options. Set a random string in the session, and also put that string in the official voting form/link. If they don't match when the vote is submitted (or if that session has already voted), it doesn't count.
They can still delete the cookie and vote again, but it will work around the direct links to voting. I doubt it's possible to stop cheaters completely, especially in an anonymous system, just slow them down.
So, my check order looks something like this:
- dumb check (id is a number; id is used for a query, so it should be at least escaped. Or checked to be a number)
- $_SESSION['started'] && $_SESSION['hits']>=1; 'should' keep people from hitting the page directly ('brutalists')
- $_GET['x'] == md5($_SERVER['REMOTE_ADDR']); 'x' is provided at the 'viewer' screen, so should have been initialized there (of course, you can try guessing, and it will barely take any time to break it); can't guess it? -> get it at the proper page ('viewer')
- !$_SESSION['voted'] && !$_COOKIE['voted']; locks voting for a session after having voted and restrains voting of returning users (but who don't know a thing about cookies and have them turned on)
Because this photo contest is not really popular, you can't actually expect that 2 different machines will get this same very ip within the day (and vote for the same photo). And that's the other part - when i was developing, actually, the same ip can vote for multiple photos but only 1 per day for each.
Now, having ruthlessly cut out a very big cancerous swelling (126 000 entries >>> 960 O-), i'm beginning to see a new wave - similar ips, same photo number voted for, difference in time - 30 seconds - 3 minutes. Can it really be that people are rebooting routers?
And, as concerns the serious part, (thanks to a friend) if the goal is to create anonymous voting - have people with no accounts participating in a voting contest - there are 2 options:
1) you track them with ips (something similar or much better to what i have)
2) you don't track anything, because in the internet, you really can't be certain about things. Sessions, like ips, are easy to drop, fake, spoof, etc. Thus, you put different objects on the stage, put buttons underneath them and set a delay between 'button hits'.
If you want to go further, you could restrain bots a little bit - put a (RE)captcha.
From the point of view of site owner, the second voting scheme would even be a bit more interesting, because it, in a way, would actually enduce users to 'have their browsers pointed to the contest'/might create a some-time-constant audience. And during this time, more banners can be served O-).
Maybe someone will find this useful