Beginning of January, I attempted to put as many IP address blocks in the blacklist, as well as filter more aggressively on unwanted keywords, unfortunately with limited results. The situation increased dramatically once I implemented a custom spam filter based on the following observations:

  • IP address ranges were very distributed and while some reoccurrence could be seen, less than half of the spams were caught by this list
  • The text seems to be composed on highly adaptable templates, where you could not blacklist given words, e.g.
    “{Hello|Hi} there, {simply|just} {turned into|became|was|become|changed into} {aware of|alert to} your {blog|weblog} {thru|through|via} Google, {and found|and located} that {it is|it's} {really|truly} informative. {I'm|I am} {gonna|going to} {watch out|be careful} for brussels. {I will|I'll} {appreciate|be grateful} {if you|should you|when you|in the event you|in case you|for those who|if you happen to} {continue|proceed} this {in future}. […]”
  • The review of the Apache logs did not yield any further distinctive keyword (e.g. in the user-agent).
  • The only interesting field was the provided email, almost always following the following pattern: “Word 1 starting with capital letter” + “Word 2 starting with capital letter” + “number  between 10 and 9999” (at) “a small list of predefined major free email providers”, e.g.

This last point is exactly the logic which got implemented in dcCustomSpamFilter with the following regular expression and a great success rate:

    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';

The whole code for this custom DotClear spam filter is below and was placed in a newly created folder [DotClearRoot]/plugins/custom_antispam/:


if (!defined('DC_RC_PATH')) { return; }
    /* Name */            "Custom_antispam",
    /* Description*/        "Custom Anti Spam Filter",
    /* Author */            "",
    /* Version */            '0.1',
    /* Permissions */        'usage,contentadmin',
    /* Priority */            200


if (!defined('DC_RC_PATH')) { return; }
global $__autoload, $core;
$__autoload['dcCustomSpamFilter'] = dirname(__FILE__).'/class.dc.filter.custom.antispam.php';
$core->spamfilters[] = 'dcCustomSpamFilter';



class dcCustomSpamFilter extends dcSpamFilter
    public $name = Custom anti spam Filter';
    public $has_gui = false;
    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';
    protected function setInfo()
        $this->description = __('My custom anti spam filter');

Cette méthode prend les paramètres suivants :

$type : le type de commentaire (comment ou trackback)
$author : le nom de l'auteur
$email : l'adresse email de l'auteur
$site : l'URL du site de l'auteur
$ip : l'adresse IP de l'auteur
$content : le contenu du commentaire
$post_id : l'ID du billet sur lequel le commentaire a été posté
La dernière variable $status doit bien être déclarée en référence (&$status) puisqu'elle permet de transmettre le statut du commentaire si celui-ci est marqué comme spam.

Cette méthode doit renvoyer true si le message est un spam et null si on ne sait pas.   
    public function isSpam($type,$author,$email,$site,$ip, $content,$post_id,&$status)
        if (preg_match('/'.$regexEmail.'/',$email)) {
            $status = 'Filtered';
            return true;
    public function getStatusMessage($status,$comment_id)
        return sprintf(__('Filtered by %s. - generated email match'),$this->guiLink());