Forum Moderators: coopster
In one of our sites, we have some reviews written by us or by our customers about some services/products.
The current script requires from the author to be a valid, registered member of our community. And he/she can write only a text in some pre-defined categories (plus some rating options with 1-5 stars)
Seeking a way to increase the amount of reviews written by our customers, the management department decided in a new approach.
The new form is updated with more options, questions, texts and plus, the abillity to upload 1-5 images per review. And all can write there. No registration required.
The moment you hit the submit button though, your page will be refreshed with all your options/texts intact, your uploading of photos will be completed and you will see a small thumb of the picture (with option to delete it) and in case you are not logged in, a new page will appear over the current page (not like a popup, but rather like a page loaded in lightbox) asking you to login or register in order to publish your review.
What do you see in this approach? (performance issues, user friendly issues.. such things)
What pitfalls to watch for? (image file size is an obvious one).
Take notice that the website has many hundrents of thousants of visitros during summer time and drops to some thousants during winter time.
What do you see in this approach? (performance issues, user friendly issues.. such things)
What pitfalls to watch for?
1. If "anyone" can use it, it needs to be moderated. If you're not doing it already, submissions should have an active field marked "0" and is only active when you set it to 1.
2. Opening it to "everyone" will make it much more of a target, make sure your programming is as tight as possible.
3. Re: image file size: Do not just allow raw uploads, period, resize them server side. Most people on the web know nothing at all about file size, and don't care. Most of them will be uploading possibly 5-20 MB images, raw from the camera.
If you're using PHP, you know about the 2MB limit inherent in default settings, and how to crank up the memory and max_upload_size to accommodate, but this is a bad idea. The GD toolkit is horribly inefficient for manipulating larger uploads, and cranking up the memory usage for PHP on a busy site can easily cause it to tank - or if shared, cause your ISP to pull the plug. Install and use ImageMagick, which is a lot less memory intensive.
4. This:
The moment you hit the submit button though, your page will be refreshed with all your options/texts intact....
Do you have methods in place to manage "abandoned" articles? Are they editable, and can you come back at a later time, log in, and further update them? I see a maintenance nightmare here. One day your server gets full, and you know there are a bazillion files that are not active, but have no way to separate them from files attached to "live" articles. So you need to do something about that, if it's not already in place.
Using the "active" scenario, I would have them post and complete the article first, THEN allow images to be attached to that article. You'd have a "completed" field initiated by the user as a mark of a completed article. This will help separate abandoned from complete (in combination with the below.)
A second aspect, although anyone can do this, you should allow them to save an uncompleted article, then return to it later to finish and annotate, attach images. So you would want a third field, call it "user_saved" or something; this way when you run a cron job to "clean house" you can remove items more than X days old and user_saved=0 and completed=0. This, of course, would require an account and a login for the "free users," but it would make it more usable, I would think. An additional bonus, when they upload, you can now attach the images themselves to an article ID, so when cleaning house, you simply look in the images table to locate images associated with abandoned articles, and know you're deleting dead weight.
Overall, just like shopping carts, you have to have some method in place to manage "abandonment" or your file system and database become utter chaos. See point #5.
5. This,
The moment you hit the submit button though ... you will ... (be asked)... to login or register in order to publish your review.
I would feel so miffed. You said no registration required, now I have to create an account. This would increase the abandonment exponentially, like carts that require login to see shipping. Put the info up front, make them create an account before adding the article. I know the reasoning, get them to invest and once they invest they might be motivated to create an account . . . . I am not a psychologist, but I think this is baaaad and will have the opposite effect. Not only will they abandon, they will likely be peeved and generate bad mojo on you.
6. Formatting: We have all these cool Javascript widgets to format our documents, but many users don't bother, or they are on a copy and paste mission to get as many docs done as possible. They will not always use them, and don't care - it's **your** problem. So if it's not in place, you need some reliable system server side to format unformatted text. A down and dirty I often do, if no markup code is found in the text, I format it for them, creating paragraph tags around double-spaced lines.
Corollary to that, see #2, you will have to be extremely careful about what tags, if any, are allowed to be posted. This is why most boards use "BBcode" style tags []. If you allow straight HTML, on submit it should be compared against an allowed tag list to avoid annoying things like <blink> and dangerous things like <script>. Turn the carats and double quotes " to entities before going into the DB, back to carats for display, but for editing, leave them as entities - you probably know the relevance of this if you've ever seen what happens when editing raw html in a textarea.
Technical: Images
You really need to be careful about this one. Are you having the users simply provide a link/URL to an already online image? If so, there are a lot less concerns on your end. But you do need to consider: linking to another website is "unethical" (best word?). You're stealing their bandwidth for your own site. I would not recommend or condone this (unless of course it is to an imaging service, a la Flickr/etc). But this is about your only concern I think.
However... if you are allowing your user to upload the image (file) to your system... be extremely cautious!
As a note, PHP will automatically process the upload and put the file details in the $_FILES super/global array (or maybe its $_FILE ?). This includes file size, file name, etc. For a casual user... you need to do basic checks like: (1) check the file size; don't let large files onto the system; a user could easily consume your webspace with huge images. A sig. problem if you're expecting ~100kb images. So consider what you want your limit to be. (2) Even though this can be thwarted (user manually change extension), I always check the file extension. If you want just JPEG, then have it only accept files with *.jpeg, *.jpg, etc extensions. I would say at minimum, these are the bare basics. This is largely effective for casual users... from preventing them from uploading unintended file types, largely because casual users are unaware of file extensions (that they exist, what they do, etc).
Major issue here is this: Your allowing a user to place a file onto your server. This is a huge security concern... a malicious user (even a malicious individual who is not a user) could potentially take advantage of this, and upload malicious files onto your server!
My suggested approach to help alleviate this as follows:
(1) Check the file type (MIME/etc). PHP will grab this info from the file and provide it in the files array. Check it for expected value(s). Even a "more advanced" user who will change a file extension will not know how to change this, since it is embedded within the file source. Note however, that an actual advanced user could modify even this!
(2) Store files outside the web root! Your web root looks something like: "/home/mysite/public_html/". Instead of putting these images in a place like "/home/mysite/public_html/reviews/images/"... put them somewhere like this: "/home/mysite/files/review-photos/". Why? Someone uploading a malicious file (disguised as an image) will try to access/use the file to run/execute it. They can do this themselves if you keep the files in the public-domain. But by putting the files "outside the web root"... you are putting them where only the server has access.
(2) (a) By doing this however, you'll need to make a "wrapper" that takes public requests for these images and grabs the file via the server end, and then serves them out. Something like "/reviews/images/my-widget-review.jpg" could become "/reviews/getimage.php?id=my-widget-review" (where 'id' could be the file name, a database ID to the file, etc).
(2) (b) In doing this, I would recommend setting up a database table to track files. Nothing too complex. But it should minimally store a table/row ID, file name on server. You could then use this info to map a request like 'getimage.php?id=285' to the image file. On my own setup, I include such additional fields as: file size, file type, file extension, original file name (useful for returning file to user since I strip out the original file name), when it is posted, ip of poster, user who posted, etc.
(2) (c) I would also store the files with an arbitrary file name on the server. For example, if someone uploads 'my-widget-is-very-cool.jpg', it would be stored on my server as something like '100128-121455_review_mywidget.jpg' (YYMMDD-HHMMSS_type_filename.extension where the filename is stripped of non-alpha/numeric characters and limited to ~8 characters --- I just like it this way lol). Even though having the files outside the web root protects them from direct access, changing them helps as well (especially if file is inside the web root). Most importantly, renaming them ensures that no new file will overwrite/conflict with existing file. In my setup, including the date/time and characters of the original file name in the new/final file name helps protect against this by ensuring with high probability that no two filenames will match. You could implement a database/PHP "does this filename exist" mechanism, but it is just extra code/processing that is entirely unnecessary (unless your site is getting HUGE traffic in the intervals of minutes/seconds).
(2) (d) Since you are using a "go between" file to access the image(s), put some safety here. You could force PHP to load out headers that: (a) force the user browser to download the file; or (b) force the user browser to see an intended MIME type (ie: JPEG image --- will make browser read/load file asn image instead of trying to execute it as something else).
(3) This only applies if you are working with an image... which you are! Use the function getimagesize() to validate that the uploaded file is in fact an image! This function will output an array of info about the image file. However, if the file IS NOT an image, it will return nothing (an empty array I believe). So setup a check on your processing to utilize this --- if the file is not an image, reject the upload!
[4](4)[/b] Avoid allowing a user to execute an uploaded file. And avoid renaming the uploads with any type of executable extension (.php, etc). A number of image formats support "comments". Hidden PHP code can be embedded in these comments, which if executed on your server through PHP, could do damage/harm.
All in all, you're likely to be fine. But understand that you should always treat user data/info as suspect! And you should always treat file uploads as if they were the plague. :)
Good Intentions?
One area of concern: You decided to do this in an effort to *increase* users making reviews? I'm not so sure your approach will work as intended/expected. It sounds like you just went from a simple short-review/star system to a system requiring (or offering) images, additional questions, etc. In my experience, the more you ask a user to do, the less they'll do.
In other words... if all you asked from users was a star rating; you'd probably get a lot of people "reviewing" --- why? Because it is easy/simple. But by adding additional questions, offering picture upload, etc... although you increase the options to the user... and even if these fields are optional... I guarantee you a group of your users will simply look at all that and decide not to review, because they may not want to spend the time filling in all the fields.
However, this is not universal (just something that is generally common). Things like Twitter are popular not just because people can blurb about their moments, but because it is also very simple/easy. Imagine if with every tweet you had to: enter your exact current location, what you're wearing (lol), etc. I imagine a lot less would bother due to the effort required.
Of course, if your user base is of --high quality-- or your user base is specifically asking for more detailed reviews... then having a review system as you describe would probably be perfect. I am just tossing it out there that I hope you've considered this fully. If at minimum, I would hope that you make most of these "extended" fields optional, so a user could skip them.
Most certainly, you want to resize images server-side! As rocknbil states, users will upload files all over the place in size. You probably want the images in an intended size range (file size and perhaps even by dimensions). Utilize the various functionalities of ImageMagick to make these changes! It will hugely benefit you. ImageMagick is A+++ in image handling/manipulation with PHP. It does a lot; it does very well; and it does incredibly efficiently! I even love it so much I installed ImageMagick on my PC to run personal processing on my images. :)
I think rocknbil has offered you a lot of great ideas for your setup/management. Between all this advice, you should have a lot of food for thought. :)
The moderated ideas rocknbil provided seems good. But i was thinking on a different approach. To store all user data in a temp table(or even database) and just copy them over to main table(s) as soon as registration/login is complete (or they get an approval from an admin/moderator). The 'garbage' data is easy to delete afterwards with a cron job.
The reason for the two different tables/databases is to lighten the load of the main core of the review/rating as much as possible due to heavy traffic most of the year.
No, we were not thinking to allow any tags into the article. Our experience shows that an average user writes somewhere 50-150 words on a review, so no styling is needed (except for some major keywords which i can bold them with regex)
The 'user-unfriendly' issues find me in total agreement.. the less the user has to do, the more he will partcipate. But the decision is not mine on this one.
Again thank you for all these ideas/observations
The moderated ideas rocknbil provided seems good. But i was thinking on a different approach. To store all user data in a temp table(or even database) and just copy them over to main table(s) as soon as registration/login is complete (or they get an approval from an admin/moderator). The 'garbage' data is easy to delete afterwards with a cron job.
The reason for the two different tables/databases is to lighten the load of the main core of the review/rating as much as possible due to heavy traffic most of the year.
No, we were not thinking to allow any tags into the article. Our experience shows that an average user writes somewhere 50-150 words on a review, so no styling is needed (except for some major keywords which i can bold them with regex)
The 'user-unfriendly' issues find me in total agreement.. the less the user has to do, the more he will partcipate. But the decision is not mine on this one.
Again thank you for all these ideas/observations
Image size and file types allowed are all into consideration (but with GD scripting)
Take imageMagick head on, before you start. :-) It's well worth your trouble. Sure you could have text instructing users to size their images sensibly so it doesn't blow up your server, or time out, or whatever so that GD can manipulate it - but half the problem is people don't read, the other half, they don't understand. I did this, and managed to make GD work, but if the site gets busy it's a recipe for disaster.
Two anecdotes: 1) If you leave some small hole through which users can mess up your plan, it is a mathematical certainty that they will find it, and 2) The absolute worst thing about programming and developing is doing it over. Corollary: especially if you saw the possible problem at the outset and ignored it.
.... i was thinking on a different approach. To store all user data in a temp table(or even database) and just copy them over to main table(s) as soon as registration/login is complete ... reason for the two different tables/databases is to lighten the load of the main core of the review/rating as much as possible due to heavy traffic most of the year.
I think you'll be increasing the load, overall, as well as increasing the point of error and need for maintenance. Think about the programming required to copy data from one table to the other, as opposed to
update table set active=1 where id=$id;
The whole idea of database normalization is that you rarely, if ever, store the same bit of data twice in any area of your database. This is what relational tables are for. So you can see this is completely counter to normalization.
Take another scenario. "Oops, we need another field here." So you have to add it to both tables, then track down the programming wherever it affects both tables, debug it, etc . . .
This idea will tend to infect other areas of the programming. I'm on a project revision that is a duplication of services nightmare, with tons of temporary tables, multi script processes when a single script will do, duplicate data spread across three or four tables that should have been one, it's extremely hard to follow the logic and very easy to break things.
Less is more. :-)