image

Beginning of January, I attempted to put as many IP address blocks in the blacklist, as well as filter more aggressively on unwanted keywords, unfortunately with limited results. The situation increased dramatically once I implemented a custom spam filter based on the following observations:

  • IP address ranges were very distributed and while some reoccurrence could be seen, less than half of the spams were caught by this list
  • The text seems to be composed on highly adaptable templates, where you could not blacklist given words, e.g.
    “{Hello|Hi} there, {simply|just} {turned into|became|was|become|changed into} {aware of|alert to} your {blog|weblog} {thru|through|via} Google, {and found|and located} that {it is|it's} {really|truly} informative. {I'm|I am} {gonna|going to} {watch out|be careful} for brussels. {I will|I'll} {appreciate|be grateful} {if you|should you|when you|in the event you|in case you|for those who|if you happen to} {continue|proceed} this {in future}. […]”
  • The review of the Apache logs did not yield any further distinctive keyword (e.g. in the user-agent).
  • The only interesting field was the provided email, almost always following the following pattern: “Word 1 starting with capital letter” + “Word 2 starting with capital letter” + “number  between 10 and 9999” (at) “a small list of predefined major free email providers”, e.g. MailletQuijas95@yahoomail.com

This last point is exactly the logic which got implemented in dcCustomSpamFilter with the following regular expression and a great success rate:

    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';

The whole code for this custom DotClear spam filter is below and was placed in a newly created folder [DotClearRoot]/plugins/custom_antispam/:

_define.php

<?php
if (!defined('DC_RC_PATH')) { return; }
 
$this->registerModule(
    /* Name */            "Custom_antispam",
    /* Description*/        "Custom Anti Spam Filter",
    /* Author */            "www.ness.ch/misc/",
    /* Version */            '0.1',
    /* Permissions */        'usage,contentadmin',
    /* Priority */            200
);
?>

_prepend.php

<?php
if (!defined('DC_RC_PATH')) { return; }
 
global $__autoload, $core;
$__autoload['dcCustomSpamFilter'] = dirname(__FILE__).'/class.dc.filter.custom.antispam.php';
$core->spamfilters[] = 'dcCustomSpamFilter';
?>

class.dc.filter.custom.antispam.php

<?php   
//Source: http://fr.dotclear.org/documentation/2.0/resources/plugins/antispam

class dcCustomSpamFilter extends dcSpamFilter
{
    public $name = Custom anti spam Filter';
    public $has_gui = false;
    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';
 
    protected function setInfo()
    {
        $this->description = __('My custom anti spam filter');
    }

   
    /*
Cette méthode prend les paramètres suivants :

$type : le type de commentaire (comment ou trackback)
$author : le nom de l'auteur
$email : l'adresse email de l'auteur
$site : l'URL du site de l'auteur
$ip : l'adresse IP de l'auteur
$content : le contenu du commentaire
$post_id : l'ID du billet sur lequel le commentaire a été posté
La dernière variable $status doit bien être déclarée en référence (&$status) puisqu'elle permet de transmettre le statut du commentaire si celui-ci est marqué comme spam.

Cette méthode doit renvoyer true si le message est un spam et null si on ne sait pas.   
    */
   
    public function isSpam($type,$author,$email,$site,$ip, $content,$post_id,&$status)
    {
        if (preg_match('/'.$regexEmail.'/',$email)) {
            $status = 'Filtered';
            return true;
        }
    }
   
    public function getStatusMessage($status,$comment_id)
    {
        return sprintf(__('Filtered by %s. - generated email match'),$this->guiLink());
    }
}
?>