Many people obfuscate their email address on web sites in the hope that bots are unable to extract their address from websites. That could look like the following:
email [AT] example [dot] tld
This approach is for example used by Mailman’s archiver pipermail and the MARC mail interface used by the KDE mailing lists. Some people even ask to “not quote the e-mail address unobfuscated in message bodies”.
So is it useful to obfuscate the email address? Does that add any security?
The answer is No. This obfuscation is a kind of a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) which requires that a computer cannot solve the Artificial Intelligence problem when it has access to all information required to create the test. Yesterday I tried to proof that this kind of CAPTCHA is broken and wrote a small application which is able to extract the email addresses from a public pipermail archive. The application is less than 300 lines of code and can automatically download all emails for a given month and year and extract the sender’s address by just extracting all a elements from the online accessible emails and applying a regular expression on the text to get the email address. I only wanted to work half an hour on that. In the end I had to compile Qt 4.6 because I needed the new QWebElement 😉 If someone is interested in the source code I could create a repository on gitorious.
The following image shows the result of an “attack” on the plasma-devel archive. For privacy reasons I blurred the user part of the mail address.
I don’t think there is any reliable way to obfuscate an email address using simple text. If there is an algorithm to obfuscate the address, there is a regular expression to unobfuscate the email. The only way to protect an email address is to not include it anywhere where a bot could harvest. That is replace it by a “real” CAPTCHA that will reveal the email address when solving it. For websites there is for example the Mailhide API of reCAPTCHA. For mailinglists that is completely useless as the email address is already included in plain text in the email headers. Instead of parsing websites bots could just subscribe to the mailinglists.
So please stop obfuscating your email addresses. It is useless and makes it impossible to just click on an email link. Instead the reader has to solve the useless CAPTCHA.
The point is that the authors of the mail address harvesting bots usually don’t go through the trouble to detect those obfuscated mail addresses, there are enough plain mail addresses out there in the wild that it is not worth the trouble. So having plain text obfuscation helps there.
If you think real CAPTCHA will help you, a lot of them can be broken by a computer, for instance, see: http://caca.zoy.org/wiki/PWNtcha
The question is whether it is worth for the developer of the harvester to develop the regexp, and then to use the extra CPU time, or if they get enough results without caring about obfuscated emails.
Another approach that works rather well is to encrypt the adress using a more complex algorithm and then to provide a simple javascript/actionscript script to decode it. People gathering email tend not to take time to execute the javascript on the pages.
It does however prevent those whose browser does not have javascript activated (for any reason) to read it also.
Another solution is to put the mail in an image directly, that’s not really more complicated than reading the CAPTCHA, writing what’s in it and then get the mail link.
@Thomas and Cyrille: I assume that the bots use regular expressions to harvest the unobfuscated email addresses. Adding some derivative regexs to match also the “standard” obfuscation rules is probably no problem and CPU time doesn’t matter at all for bots.
Btw. I know that most CAPTCHAs are broken, well probably all CAPTCHAs.
@Anarky: image based CAPTCHAs are broken and more complex algorithms doesn’t help as well. Especially including the JavaScript does not help as you give the tool to break the CAPTCHA into the hand of an attacker.
Alternatively, just accept that you will have your email address in plain text somewhere on the internet and just get a decent spam filter.
That’s my approach, it makes it far easier on people trying to contact me and far easier on me rather than having to worry about stuff.
I get maybe one spam message a week, if even that. I get no wrongly flagged spam.
@Mike Arthur: exactly. That’s the right approach.
How about adding this algorithm to KMail? So you could just copy and paste an address in the “obfuscated” format into the relevant fields and KMail translates the string to a proper mail address?
That would be really useful IMHO.
I like the idea. So my code could be useful at last 🙂
I would say that Thomas McGuire is correct. It’s low hanging fruit vs not-so-low. Some obfuscators are quite easy to guess, and are therefore almost as “low”, but in my experience it’s definately better than not obfuscating at all.
I’ve had success with a completely different method. I use a relatively simple JavaScript routine and inline document.write() calls, and avoid having any recognisable email addresses in the source (and the ‘@’ character doesn’t appear anywhere).
This relies on the fact that bots may read the source (including JS) but don’t actually execute any JavaScript. The result is a clickable mailto: link, and I haven’t had any more spam to those addresses since doing that.
Of course, once there’s no more low-hanging fruit, the nasty people will reach up higher, or maybe get a ladder. If that happens, I’ll have to be more creative.
Steve
@Steve: I would like to see the fact that bots do not execute JS 🙂 I’d say that’s an assumption. Nevertheless it’s a nice idea, but the same problem as for all such ideas: as soon as all users use this approach it is broken. What we need is a general solution for the problem to not play hide and seek any more.
@Martin: Full Ack! The worst thing about it is the time I have to invest in order to “manually” de-obfuscate the adresses. As if we didn’t have a hard enough time fighting the SPAM in our own mailboxes…
BTW, if I were a Spammer, I wouldn’t even go through the trouble of using a bot, let alone programming one. I’d just use one of the semi-public listings already containing the millions of addresses I need.
I came across this site recently: http://scr.im/
It’s like a URL-shortener for email addresses, with the added benefit of CAPTCHA protection for the address. For example here’s my address: http://scr.im/yawar
The point you start to be wrong is when you say “I assume that the bots…” Just have a look at what an empirical investigation on this problem results in:
http://koeln.ccc.de/schnucki/
http://www.0x11.net/schnucki/
Cheers, Peter
@Peter: sorry the data is more than two years old. I would not count on it being still valid.
@Yawar: yes like reCAPTCHA. I just looked at the site and it seems to have some accessibility problems and it uses a kind of already broken CAPTCHA.
I still think that obfuscating is a good idea when e.g. posting in technical forums. I have never seen anyone stumped by something like:
“Contact me at grasch ate simon-listens ° org” (which is what I use quite often and I get no spam to that address at all.
And if someone is too lazy to edit the e-Mail address and start the mail program manually instead of clicking on a link I most likely don’t want to know what he had to say anyways :).
However, I agree that obfuscation is no solution when e.g. dealing with customers. However, as most non-tech users browse with javascript on, I use a javascript to print the rot13 encoded e-Mail address and provide a noscript alternative which is obfuscated. That way even the hardcore links-users can see an e-Mail address and spam bots are having a hard enough time to (apparently) loose interest.
Greetings,
Peter
@Martin: The data is still valid and obfuscation still works, exactly for the reasons @Steve wrote. I run a bunch of spam traps which contain both obfuscated and unobfuscated addresses. The latter hardly get any spam.
One reason is that you can’t just run a RE over the whole net and recognize all subtle ways to find mail addresses. Your Bot also can’t really execute all JavaScript it finds. Well, it could, but why should a harvester do so when he’s got enough addresses without doing so. Maybe one day this approach will be broken but until now it works fine (except for the people having to correct the addresses manually but that’s not really much work).
Anyway, what I wanted to write was a reply to @Keopa: KMail 1.9 already had such a feature, I think its gone in KDE4 (can’t try currently): You could insert an address in the format foo at example dot com into the recipient field and it Just Worked 🙂
Heh. Obfuscating e-mails is totally wrong. You lose the usability of clickable mail links and the bots get it anyway.
And then you get spam no matter what you do. A better spam filter is the answer.