Sammendrag
Spammers often change the words in the E-mail messages in order to pass unnoticed by SPAM filters. These changes must
be small in order to preserve intelligibility of the messages. To detect the modified SPAM words, we can use approximate search. If approximate search is unconstrained then many false positives are generated since unconstrained search accepts any distribution of changes in the original SPAM words. In this paper, we describe a new SPAM filtering scenario, in which we use approximate search, but we introduce constraints on the numbers of change operations or the maximum lengths of runs of change operations. We present a generic SPAM filtering algorithm that uses constrained approximate search implemented in a bit-parallel manner. We discuss accuracy and efficiency of this SPAM filter and present comparative experimental results obtained with
unconstrained and constrained approximate search algorithms.
Vis fullstendig beskrivelse