Using alt text with CAPTCHA image verification

Forum Moderators: open

Message Too Old, No Replies

Using alt text with CAPTCHA image verification

KimmoA

4:27 pm on Sep 20, 2005 (gmt 0)

How are us pedantic Web nazis supposed to create an image verification mechanism without the string displayed in the image represented in the alt attribute as text? :)

Personally, I don't like those things at all. I often can't see what it's supposed to say anyway, and I imagine that whatever the technology is called that makes a computer being able to understand characters and numbers from bitmaps will soon be far superior to the human brain.

There are pretty bullet-proof ways of doing this without a stupid hard-to-code image verficiation process. For example, you could just ask a question like: "What color is the sky?" and let the user answer that. If you still get spam, you could just randomize it a little. Or pick less guessable questions.

Tapolyai

4:37 pm on Sep 20, 2005 (gmt 0)

An interesting concept.

You can do something more precise, like writing out a simple mathematical formula. "How many are in a dozen?" "What is half of two?"

KimmoA

4:49 pm on Sep 20, 2005 (gmt 0)

You can do something more precise, like writing out a simple mathematical formula. "How many are in a dozen?" "What is half of two?"

Yeah. I've thought of that. It wouldn't be too hard to make it say things like: "What's five times fourty-four, plus nine?".

Then again, people are stupid and lazy...

BertieB

6:07 pm on Sep 20, 2005 (gmt 0)

There was a discussion of this about a month ago on /. The general feeling was that current image verification (ie CAPTCHA) is getting less effective at distinguishing humans and bots, because:

- There are bots with good enough OCR to 'see' words that are considered well-obfuscated

- The more a word or character sequence is obfuscated the greater the chance a human (especially one with a visual impariment) will not be able to read it

The W3C disagrees with the use of CAPTCHAs because of these accessibility issues. There are alternatives, such as the natural language and math questions mentioned, which could be effective -- so long as there is sufficient variety in the questions asked. However, eventually bots will become adept at either answering these questions straight off, or learning the answers to them.

The other problem is all you need to defeat a bot-detection scheme is a person who is willing to be paid nothing or next-to-nothing to 'authenticate' as a human for you. That can be tricky to stop.

So aside from keeping some folks at Berkeley busy working on the problem, what do you think can be done to keep image verification effective?

BertieB

11:39 pm on Sep 20, 2005 (gmt 0)

Link [w3.org] to W3C document on problems with image verification in general (Inaccessibility of Visually-Oriented Anti-Robot Tests - Problems and Alternatives)

They also prepared a slide show [w3.org] on the issue (Escape from CAPTCHA)

Sorry for the double post but I ran out of time to edit.

KimmoA

11:57 pm on Sep 20, 2005 (gmt 0)

BTW... I'm very fascinated about how this is possible to code.

There are a few things I can't understand how to code:

* A chess game's AI.
* Any FPS, including Wolfenstein 3D.
* Reading text from an image.

Of course, there are many other things that I wouldn't realisticly be able, or have the need/will, to create, but the mentioned topics are the main ones where I wouldn't know where to start.

BertieB

12:36 am on Sep 21, 2005 (gmt 0)

Some topics are fascinating from a programming POV, simply because they involve thinking outside the usual for... loops and if... statements. I quite agree that these could be very interesting to program.

>> a chess game's AI

Have a standard set of opening moves, store which works best at providing the best couter to an opponent's opening and provides options later on. Then, have an algorithm to work through sequences of moves, and score moves higher if they (eventually or inevitably) result in better positions or pieces taken.

Naturally, a good knowledge of chess is a must for either the senior programmer or the project leader ;)

>> Any FPS, including Wolfenstein 3D.

Have an engine to track where everything is, what state it is in, etc and evalute what happens when events occur. Use a 3d engine of some description to selectively render elements to the player.

(I have no idea either :0) )

>> Reading text from an image.

Tricky, but this is how I would assume it works. Check the colors of each pixel of the image. The > 90% of pixels that share the same or similar colour assume to be the background (say, white). Assume the rest of the pixels make up the characters (say, black). Combining horizontal and vertical scanning, find the boundary of each character (optional). For each block of pixels, compare parts to known shapes -- eg letters a-z -- and keep iterating until you find a match. Disregard parts that you cannot find a match for. Optionally, if the sequence of characters is a known to be a word, run a spell checker on the OCR output to verify it makes sense.

For obfuscated characters the process would undoubtedly involve a few stages of 'cleanup' and disregarding 'noise' pixels. It still comes down to pattern recognition though.

I simplify matters somewhat, as this is a tricky field to be in for both the people developing the visual challenge systems, and those devising bots to pass the tests. It is essentially an arms race, with each party tring to stay slightly ahead of the other.

Disclaimer: I have never programmed a chess AI, an FPS, or any form of OCR :)

Edited to keep it slightly more on-topic

KimmoA

1:03 am on Sep 21, 2005 (gmt 0)

I have never programmed a chess AI, an FPS, or any form of OCR :)

Well... at least you tried explaining how you assume that it's done. I'm always interested in reading about stuff that I care about.

(A multi-player only chess game is the best I could do (but haven't yet). I have only made a "fake" 3D game, using four directions and "steps" of texture "boxes" to simulate a 3D view. It's one of those projects I want to finish one day, BTW...)

[edited by: encyclo at 1:22 am (utc) on Sep. 21, 2005]

Tapolyai

2:43 pm on Sep 22, 2005 (gmt 0)

I have a very very simplistic way to resolve this, and it requires a two-factor authentication to create an account and then a two-factor authentication to allow users to log-in.

These particular solutions are dirt cheap - it's mostly code, and the deployment is, in general simple. I have already did conceptual testing with plain old vBulletin and phpBB. In both cases the solution is "transparent" and easily deployed.

There are some commercial implications, that would be able of great value, but I don't have to the time to develop it.

The first two-factor would be "what you know" & "what you are", and in the second case would be the "what you know" & "what you have".

If I dwell into it any further I might give the idea away and I would like to find someone to write a code for it.