Projects
»
Security/System Pogramming
OEM Spam Filtering Engines
Customer
Founded in 1999 and with offices in US, Europe, and Asia, the customer is
a leading provider of OEM spam filtering and anti-phishing software,
competing with Symantec, McAfee, and Cloudmark. Over 5,000 companies and
10 million consumers worldwide rely on them for their services.
Business Case
The goal was to develop a spam filtering engine that offered information actuality, up-to-date categorization, scalability, reliability, framework flexibility and extensibility. The client needed an ability to scan as many URLs for as many categories/fraudulent practices/phishing/spam/viruses/etc. as possible.
Solution
Fifteen developers have worked for almost three years to produce an OEM
spam filtering engine with the following components:
- SDK
Anti-Spam SDK is the engine for anti-spam solutions, providing powerful
functions to perform spam filtering and other advanced email message
analysis.
In addition to the usual originating information (often falsified) in the
message header, the engine can identify additional information in the
message. Additional functions use this information to calculate
sophisticated Bayesian statistics to determine the probability that any
given message is spam or legitimate email. The engine can identify up to
99.9% of spam with a near-zero false positive rate, even for
foreign-language spam.
A finely-tuned implementation can analyze up to 1,100 messages per second
per process. The engine scales to arbitrarily large numbers of machines,
limited only by external factors such as incoming bandwidth.
The engine can identify:
- The language or languages of a message;
- Phone numbers;
- Domain names;
- URLs;
- Images;
- Key words and phrases.
A key component of the engine is the ability to identify key words and
phrases even when hidden or obfuscated by the spammer. V14gr4 is
identified as Viagra; www dot im a spammer dot com is identified as
www.imaspammer.com. Sophisticated pattern analysis can identify even more
subtly hidden key words and phrases.
The SDK API (Application Programming Interface) allows developers to integrate Anti-Spam solutions with a wide variety of applications.
Scores of configuration options allow OEMs to balance memory usage, throughput and detection.
-
Statistics and Analysis
Spammers do not rest; they're always looking out for ways overcome
anti-spam measures. The engine not only evaluates messages, but also
collects feedback and analyzes messages so analysts can respond to new
spam techniques.
OEMs can use the SDK to:
- Monitor spam detection information and collect statistics;
- Automatically remove redundant and irrelevant key-words;
- Suggest new patterns to identify spam.
The client also uses the engine internally to implement a continuous information-gathering process that analyzes data from hundreds of thousands of messages collected from "honeypots" and known-good sources. This analysis continually refines and enhances the engine's ability to detect spam.
- Image Spam
The engine looks for image attributes that are unlikely to exist in legitimate email, including:
- Jigsaw puzzle-style images;
- CAPTCHA-style images that intentionally obscure content;
- Images designed to emulate plain text.
The Engine treats images-and parts of images-as attributes that can be
extracted and tracked over large numbers of messages.
- Graylisting
Ordinary graylisting solutions temporarily reject an email solution from a
new sender once, assuming that legitimate senders will try to resend the
message, but spammers will not bother.
SolveITLabs developed a graylisting server that does much more: it also
establishes and maintains a reputation for hundreds of thousands of
different servers and originators to help weed out unreliable senders. It
uses proprietary technology to offer the fastest throughput even on
congested networks and can be easily and securely accessed across
firewalls and other network security devices without diverting IT
resources for elaborate configuration.
Sophisticated access and licensing controls allow OEMs the ability to profitably offer secure graylisting services.
- Phishing
The Engine combines its own home-grown reputation filters, along with
global access to advanced data networks, to block phishing and other forms
of email fraud.
Reputation analysis and email authentication help the system identify the
rightful owners of IP addresses, domains, email address, and even message
content.
The global data network includes near real-time reporting of phishing
outbreaks. The engine identifies and segregates phishing from other types
of spam, allowing OEMs to reject, delete, quarantine, etc., phishing
attacks before they reach customers' inboxes.
Features
- Scan IP addresses for spam/legit attributes
- Extensive spyware database
- Verify IP address and domain name owners
- Verify software owners
- Ability to prevent, detect, and disinfect zombie machines
- Detection of viruses in real-time with and without signatures
- Ximian Evolution plugin
- Novell Groupwise plugin
- Sendmail plugin
- Microsoft Exchange plugin
- Fully customizable rule evaluation and weighting
- Block spammers who spoof domain names
- Extensive options to tune performance and accuracy
- Adaptability to variable network conditions; reduce or delay analysis during periods of high traffic
Benefits
At its default settings, the SDK catches more than 95% of spam with less than 0.005% 'false positives.' Virtually all of the false positives are non-English bulk emails such as newsletters and legitimate advertisements.
Tools and Technologies
Supported platforms include: Linux (Certified for Redhat, Mandrake,
and Suse), Microsoft Windows,Solaris 8 (Sparc), Solaris Intel, FreeBSD,
AIX, Mac OS X, HP-UX
Languages and Tools: C/C++, Perl, PHP, Apache, Sendmail, Gcc compiler, Gdb debugger, Gprof, Valgrind memory leak checker, Flex text parser
|
|