lundi, février 18 2013

High increase in spammy blog comments

As already stated, having SPAMs in comments isn’t unfortunately uncommon, but the volume handled on this blog severely increased around Christmas 2012.

As shown in the graph below, the increase started on December 2012 with a little less than half of the comments being unrecognised by any DotClear spam filter (value in light blue – legend NULL):

image

Beginning of January, I attempted to put as many IP address blocks in the blacklist, as well as filter more aggressively on unwanted keywords, unfortunately with limited results. The situation increased dramatically once I implemented a custom spam filter based on the following observations:

  • IP address ranges were very distributed and while some reoccurrence could be seen, less than half of the spams were caught by this list
  • The text seems to be composed on highly adaptable templates, where you could not blacklist given words, e.g.
    “{Hello|Hi} there, {simply|just} {turned into|became|was|become|changed into} {aware of|alert to} your {blog|weblog} {thru|through|via} Google, {and found|and located} that {it is|it's} {really|truly} informative. {I'm|I am} {gonna|going to} {watch out|be careful} for brussels. {I will|I'll} {appreciate|be grateful} {if you|should you|when you|in the event you|in case you|for those who|if you happen to} {continue|proceed} this {in future}. […]”
  • The review of the Apache logs did not yield any further distinctive keyword (e.g. in the user-agent).
  • The only interesting field was the provided email, almost always following the following pattern: “Word 1 starting with capital letter” + “Word 2 starting with capital letter” + “number  between 10 and 9999” (at) “a small list of predefined major free email providers”, e.g. MailletQuijas95@yahoomail.com

This last point is exactly the logic which got implemented in dcCustomSpamFilter with the following regular expression and a great success rate:

    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';

The whole code for this custom DotClear spam filter is below and was placed in a newly created folder [DotClearRoot]/plugins/custom_antispam/:

_define.php

<?php
if (!defined('DC_RC_PATH')) { return; }
 
$this->registerModule(
    /* Name */            "Custom_antispam",
    /* Description*/        "Custom Anti Spam Filter",
    /* Author */            "www.ness.ch/misc/",
    /* Version */            '0.1',
    /* Permissions */        'usage,contentadmin',
    /* Priority */            200
);
?>

_prepend.php

<?php
if (!defined('DC_RC_PATH')) { return; }
 
global $__autoload, $core;
$__autoload['dcCustomSpamFilter'] = dirname(__FILE__).'/class.dc.filter.custom.antispam.php';
$core->spamfilters[] = 'dcCustomSpamFilter';
?>

class.dc.filter.custom.antispam.php

<?php   
//Source: http://fr.dotclear.org/documentation/2.0/resources/plugins/antispam

class dcCustomSpamFilter extends dcSpamFilter
{
    public $name = Custom anti spam Filter';
    public $has_gui = false;
    public $regexEmail = '([A-Z][a-z]+){2}([0-9]{2,4})@(123mail\.net|aol\.com|googlemail\.com|gnumail\.com|yahoomail\.com|hotmail\.com|mail\.com|gmail\.com|aim\.com)';
 
    protected function setInfo()
    {
        $this->description = __('My custom anti spam filter');
    }

   
    /*
Cette méthode prend les paramètres suivants :

$type : le type de commentaire (comment ou trackback)
$author : le nom de l'auteur
$email : l'adresse email de l'auteur
$site : l'URL du site de l'auteur
$ip : l'adresse IP de l'auteur
$content : le contenu du commentaire
$post_id : l'ID du billet sur lequel le commentaire a été posté
La dernière variable $status doit bien être déclarée en référence (&$status) puisqu'elle permet de transmettre le statut du commentaire si celui-ci est marqué comme spam.

Cette méthode doit renvoyer true si le message est un spam et null si on ne sait pas.   
    */
   
    public function isSpam($type,$author,$email,$site,$ip, $content,$post_id,&$status)
    {
        if (preg_match('/'.$regexEmail.'/',$email)) {
            $status = 'Filtered';
            return true;
        }
    }
   
    public function getStatusMessage($status,$comment_id)
    {
        return sprintf(__('Filtered by %s. - generated email match'),$this->guiLink());
    }
}
?>

jeudi, mars 1 2012

SPAM dans les commentaires de blog

Avoir des spams sur son blog, c’est malheureusement inévitable de nos jours, surtout si on laisse les commentaires ouverts indéfiniment comme c’est le cas sur ce blog.

Avec DotClear, les commentaires sur les articles sont fermés au bout de plusieurs semaines. Mais comme je souhaitais “jouer” un peu avec mon blog, j’ai décidé de laisser les commentaires ouverts de manière permanente.

Afin d’éviter de servir de caisse de résonnance pour les spameurs de tout poils, j’ai cependant pris quelques précautions. Tout d’abord, j’ai vérifié que les extensions anti-spam de DotClear suivantes étaient bien actives:

  • IP Filter – permet de définir une liste noire ou blanche
  • Bad Words – évite de devoir se poser la question en cas de commentaires avec certains mots clés douteux
  • IP Lookup
  • Links Lookup
  • Fair Trackbacks

A vrai dire, les trois derniers filtres sont peu sollicités, car après avoir créé une liste IP de spameurs (filtrage non seulement sur l’IP mais sur tout le range de l’opérateur en question), il n’y a plus de grandes surprises qui nous attendent.

Ci-dessous se trouve une liste des derniers spams reçu, où il apparait très clairement que l’article Limits of not having Visual Studio when developing attire énormément de commentaires non sollicités:

image

Pourquoi donc ce grand intérêt pour cette page? Il s’avère que cet hiver, 3 commentaires étranges ont été postés précisément sur cette article, avec comme site référant http://google.com, comme adresse email hellojohnatan@aol.com et provenant tous de l’adresse IP 81.25.45.63:

image

Curieux de voir un lien vers Google, j’ai laissé le commentaire en ligne. Le résultat – plusieurs dizaines de spams sur le même article est sans commentaire. Un élément de reconnaissance - la présence de 08)) et une URL pointant vers http://google.com permet au spameur de retrouver les blogs acceptant des commentaires. Une recherche exacte sur le terme, marqueur y compris, retourne plusieurs dizaines de pages:

image

Finalement, je me suis intéressé au parcours et logs laissé par ces visiteurs récurent. Est-ce qu’ils voient au moins mes publicités ou augmentent mes statistiques? Cela semble malheureusement peu probable, vu que le peu d’interactions avec le site – juste:

  • Un GET sur la page, avec comme referal la page elle-même et aucun user-agent
  • Immédiatement un POST avec le commentaire publicitaire – aucune autre ressource du site n’est chargée
  • Finalement le suivi du redirect vers la page de confirmation
89.33.1.174 - - [20/Feb/2012:01:09:11 +0100] "GET /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing HTTP/1.1" 200 12463 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing" "-"
89.33.1.174 - - [20/Feb/2012:01:09:12 +0100] "POST /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing#pr HTTP/1.1" 302 12613 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing" "-"

89.33.1.174 - - [20/Feb/2012:01:09:13 +0100] "GET /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing&pub=0 HTTP/1.1" 200 12577 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing&pub=0" "-"
89.33.1.174 - - [20/Feb/2012:01:09:13 +0100] "GET /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing HTTP/1.1" 200 12463 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing" "-"
89.33.1.174 - - [20/Feb/2012:01:09:30 +0100] "POST /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing#pr HTTP/1.1" 302 12615 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing" "-"
89.33.1.174 - - [20/Feb/2012:01:09:30 +0100] "GET /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing&pub=0 HTTP/1.1" 200 12577 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing&pub=0" "-"
89.33.1.174 - - [20/Feb/2012:01:09:31 +0100] "GET /misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing HTTP/1.1" 200 12463 "https://www.ness.ch/misc/?post/2011/02/07/Limits-of-not-having-Visual-Studio-when-developing" "-"

Bref, mis à part augmenter le nombre de commentaires dans les spams, ce type de visiteur n’apporte rien que des nuisances. J’ai décidé pour l’instant de conserver les 3 commentaires “balises” actifs, mais je risque de revoir ce choix dans un proche futur.

lundi, février 20 2012

Security of Smart Meters

There have been some contreverse lately about the (in-) security of Smart Meters. It started with the talk “Smart Hacking for Privacy” at 28c3 over Christmas and New Year:

While some articles about the 28c3 talk were quite acruate and neutral, others were a little more sensionalists and probably generalised the topic a little too quickly.

As a happy owner of such a smart meter in the context of a study, I wanted to have a closer look at the security of my installation. As mentioned in the previously linked article, there is as far as I’m aware no direct data connection back to the EWZ on the occasion of this study. Data is kept for a few weeks on the smart meter and retrieved by the Android tablet and stored on this device. This is also the reason why I must start the Android tablet at least every couple of weeks, ensuring that no data gets lost.

So no Internet connection in our EWZ study – and therefore a lot less potential problems Clignement d'œil. Let’s have a look how the data transfer between the smart meter and the tablet is done.

The first communication between the smart meter and the flat is done using a Devolo device over the electric wiring. I’m not aware of any special security problem for this technology, so let’s assume for now we’re safe there.

Once in the flat, the information transits via a second Devolo device which sends the feed to the Android tablet over WiFi. A closer look about the used settings (on Windows, run netsh wlan show all in a console) shows that they are pretty safe as WPA2 is in use:

SSID [XX] : ewz[SOME_NUMBERS]
Network type            : Infrastructure
    Authentication          : WPA2-Personal
    Encryption              : CCMP
BSSID 1                 : [BSS_MAC_ADDRESS]
Signal             : 99%
Radio type         : 802.11n
Channel            : 2
Basic rates (Mbps) : 1 2 5.5 11
Other rates (Mbps) : 6 9 12 18 24 36 48 54

But is there even a way to snoop some of the traffic and get a better understanding of this solution? Having your neighbour cracking the WPA2 password of the Devolo is possible, but I tend to disconnect it once I’m done with the tablet anyway – another layer of security to bypass for an attacker wanting to “hack” my smart meter.

The key – as a local curious person – to gain access to the network traffic is to exploit one of the three ethernet connectors of the Devolo device which also serves as WiFi Access Point. The network traffic shown below is therefore only accessible to local attackers which either have access to the AP, the smart meter or maybe via the electrical wire.

In a standard usage, we see a broadcast on a low network level of 50 bytes of data every 2 seconds. This matches exactly the immediate consumption update the tablet gets:

smartmeter_datastream

The data field therefore certainly contains the measurement, which is readable by anyone which can see the broadcast. But as said before, this is limited to local attackers (unless flaws exist in the Devolo devices which I don’t know about).

When shuting down the tablet and restarting it, we get a more interesting capture were we have what seems to be a fixed IP address 192.168.200.10 looking for 192.168.200.1.

smartmeter_broadcast_bootArchos

Filtering all the captured traffic for only ARP requests show that the Android device tries to successively find:

  • 192.168.200.20
  • 192.168.200.1
  • 192.168.200.2

    smartmeter_arp_filter

    A ping confirms that addresses 192.168.200.10 & 192.168.200.20 are valid IP addresses in this configuration, but that 192.168.200.20 seems either way slower or further away than 192.168.200.10:

    C:\>ping 192.168.200.10
    Pinging 192.168.200.10 with 32 bytes of data:
    Reply from 192.168.200.10: bytes=32 time=4ms TTL=64
    Reply from 192.168.200.10: bytes=32 time=1ms TTL=64
    Reply from 192.168.200.10: bytes=32 time=1ms TTL=64
    Reply from 192.168.200.10: bytes=32 time=1ms TTL=64
    Ping statistics for 192.168.200.10:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 4ms, Average = 1ms
    C:\>ping 192.168.200.20
    Pinging 192.168.200.20 with 32 bytes of data:
    Reply from 192.168.200.20: bytes=32 time=17ms TTL=255
    Reply from 192.168.200.20: bytes=32 time=11ms TTL=255
    Reply from 192.168.200.20: bytes=32 time=7ms TTL=255
    Reply from 192.168.200.20: bytes=32 time=7ms TTL=255
    Ping statistics for 192.168.200.20:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 7ms, Maximum = 17ms, Average = 10ms

    The suspicion that 192.168.200.20 is the Smart Meter itself turns into a final confirmation by an nmap scan of the network, where aside the scanning machine only two other hosts answer:

    C:\>nmap 192.168.200.10/24 -p80
    Starting Nmap 5.50 ( http://nmap.org ) at 2012-02-13 16:34 W. Europe
    Standard Time
    Stats: 0:00:35 elapsed; 64 hosts completed (2 up), 66 undergoing Host Discovery
    Parallel DNS resolution of 1 host. Timing: About 0.00% done
    Nmap scan report for 192.168.200.10
    Host is up (0.00s latency).
    PORT   STATE    SERVICE
    80/tcp filtered http
    MAC Address: 00:02:XX:XX:XX:XX (Samsung Electro-Mechanics Co.)
    Nmap scan report for 192.168.200.20
    Host is up (0.00s latency).
    PORT   STATE    SERVICE
    80/tcp filtered http
    MAC Address: 00:0F:XX:XX:XX:X (Landis+Gyr)

    Based on these observations, we can see that the whole EWZ Smart Meter setup is way more secure than the example given during the 28c3 talk. There are still possibilities to “hack around”, such as looking at the behaviour of the tablet if IP 192.168.200.1 or 192.168.200.2 start to answer (e.g. are they used as gateways or configuration servers?) but this would void the written agreement with the power company about not interfering with the measurement devices.

    Read all the articles about the EWZ Smart Meter using the tag SmartMeter

  • - page 2 de 15 -