View Full Version : % of your email as spam...
We've been monitoring the email stats on our shared/dedicated hosting network since Jan...
What is the % of your email that is filtered using spam-filtering software?
Our email servers process an average of
- 50,000-60,000 emails per day
- about 30,000 emails are blocked as spam every day (a tad under 50%)
- about 200 viruses are blocked per day
What are your stats like?
PS - I keep reading articles about how big companies buy these magic boxes they place infront of their email servers and they virtually wipe out spam - has anyone had any experience with those compared to spamassassin?
datums 06-01-2004, 12:06 AM An increase across the board due to the CAN SPAM ACT.
The bayesian and heuristic filters are losing the battle.
A combination of blocklist and signatures/rules
Have you had any experience with the "magic boxes" that are placed infront of email servers which large companies say are blocking virtually all spam ?
datums 06-01-2004, 12:39 AM Hello MaB
I never heard of magic boxes but usually from my experience you would have a server as a relay. The relay would clean/stop the spam and deliver the good mail to your exchange/qmail/sendmail server.
Very similar to the way MessageLabs works.
boonchuan 06-01-2004, 01:15 AM barracudanetworks u mean? They are suppose to be put in front of the email server. Have not tried that out yet. Anyone tried?
datums,
I was referring more to the effectiveness of those boxes rather than the setup.... we use email servers that filter out email before reaching our shared hosting servers, setup like those boxes would be setup.... I am wondering about their effectiveness - better than spamassasssin with rbls and razor enabled?
diginode 06-01-2004, 06:12 AM We implemented Spam Assasssin on our own internal mailserver a few months ago due to an uncontrolled explosion in spam traffic. What we learned is that SA works really well, 99.9% of spam is filtered (only 1 spam got through this month) and no false positives yet. Our biggest spam email address gets over 1000 spam messages a month.
The thing to remember about using SA is that you need to tune it.
- Increase the size of the token bucket and send in any good mail (ham) and spam mail into the learning system for the bayesian filter. We started with over 2000 messages in the system and that yielded good results right away.
- Remember to turn on the network checks. Pyzor, Razor2 and DCC all do a wonderful job as a clearinghouse of spam hases. We turn on all three since some spam may be in one database and not others.
- Add you own custom rules. We use several pre-made pharmacy rules that assign high spam scores to drug spam.
There is also another more advanced content filter called DSPAM. Unfortunately, they only use bayesian an other statistical filtering (no network checks), but they claim much better results than Spam Assassin (and diss them now and then).
As a result of implementing SA and the good results obtained by using it, we now offer SA installation and tuning as a service.
goldenplanet 06-01-2004, 07:21 AM We run Declude JunkMail on Imail 6 and have the following stats:
15,000 - 20,000 incoming mails per day
80-85% spammails per day
We only catch around 95% of all spam but we can't turn the screws any tighter - we'll start tagging legitimate mail if we do.
What software are you using to get such high percents?
Right now we hover around 50% filtered as spam - we use:
spamassassin with razor, ORDB-RBL spamcop.net SBL+XBL
We've used some custom rules we wrote as well as rules from http://www.rulesemporium.com/index.html
Does anyone have any sites that have additional rule sets that can be downloaded to improve accuracy?
diginode 06-01-2004, 11:28 PM Did you initialize the bayesian filter by piping enough messages to sa-learn? (200 spam and 200 ham messages).
Originally posted by MaB
What software are you using to get such high percents?
Right now we hover around 50% filtered as spam - we use:
spamassassin with razor, ORDB-RBL spamcop.net SBL+XBL
We've used some custom rules we wrote as well as rules from http://www.rulesemporium.com/index.html
Does anyone have any sites that have additional rule sets that can be downloaded to improve accuracy?
datums 06-02-2004, 12:48 AM I don't think a percentage like that can be achieved with a high volume of business email.
For False Positives are you going by customer complaints ? How accurate is that really? There is alot of room for error.
With the majority of the spam engines out there newsletter seem to cause the biggest problems due to their spammy characteristics
For personal / non-business accounts you can achieve 100% using the greylisting method (ASK)
The only complaints i've gotten were from people who've included files in their emails which our viurs scanners couldn't open ... no false positives on spam
I haven't tried feeding it 200 spam and hams to do the sa-learn
diginode 06-02-2004, 06:46 AM It is our internal server, and we pipe all of the messages identified as spam to a blackhole account (as well as a personal spam folder). We review this account constantly to check for false positives.
You need to know what you're doing though. SA uses plain bayesian analysis and doesn't have the advanced approach of DSPAM, but more than makes up for that by using network checks and filter rules.
Originally posted by datums
I don't think a percentage like that can be achieved with a high volume of business email.
For False Positives are you going by customer complaints ? How accurate is that really? There is alot of room for error.
With the majority of the spam engines out there newsletter seem to cause the biggest problems due to their spammy characteristics
For personal / non-business accounts you can achieve 100% using the greylisting method (ASK)
diginode 06-02-2004, 06:57 AM That's the reason. SA sucks ifyou don't turn on the bayesian filtering, and that only happens when you have over 200 spam and 200 non-spam (ham) messages in the token bucket. You need to use sa-learn or spamassassin -k to pipe the messages to train the bayesian filter.
You will find that after enabling the bayesian filter, your accuracy will rise to over 95% if you have a good inital selection of messages, your accuracy will be even higher.
Enableing the bayesian filter also enables auto-learn by default. This allows spamassassin's bayesian filter to automatically learn from high-scoring spam messages (record it as spam) and from very low scoring mails (record it as ham).
If anybody needs help initializing the database, contact me directly. I can provide free help for anybody with a working SA installation. I probably should write up a spamassassin howto if there isn't one already in the HOWTOs section. :)
Originally posted by MaB
The only complaints i've gotten were from people who've included files in their emails which our viurs scanners couldn't open ... no false positives on spam
I haven't tried feeding it 200 spam and hams to do the sa-learn
|