ioiom63
02-19-2009, 08:12 PM
I've been trying to work out an easy way for my users to use SpamAssassin Bayesian Filters. From what I've been able to read, you use a program called sa-learn and feed it "spam" and "ham", which makes your filtering more accurate.
The only question is how I get the messages there. It will be trivial to schedule some program to run every so often that will feed all the messages the user has moved to a Spam folder to sa-learn. The tricky part is feeding ham to sa-learn. Here's the best idea I've had so far:
Create a second spam folder in the user's account called "Auto-detected as Spam". Direct detected spam messages there using a filter. Users will be instructed to remove any incorrectly identified messages in the "Auto-detected as Spam" folder. Every so often, rsync the files in this folder to some non-user accessible folder. If ever the non-user accessible folder has a message in it that the auto-detected folder does not, assume that the user has moved it to their inbox and that it is ham.
This should work, as long as the user does not delete messages from the auto-detected folder in an effort to be tidy. We could have a script that also checked the deleted items folder to see if this was the case, and emptied messages from the auto-detected folder after they were a week old, so that users would have no need to remove them themselves.
So, here's the question. What do you all think? Does anyone have a better idea? Has this been done before?
The only question is how I get the messages there. It will be trivial to schedule some program to run every so often that will feed all the messages the user has moved to a Spam folder to sa-learn. The tricky part is feeding ham to sa-learn. Here's the best idea I've had so far:
Create a second spam folder in the user's account called "Auto-detected as Spam". Direct detected spam messages there using a filter. Users will be instructed to remove any incorrectly identified messages in the "Auto-detected as Spam" folder. Every so often, rsync the files in this folder to some non-user accessible folder. If ever the non-user accessible folder has a message in it that the auto-detected folder does not, assume that the user has moved it to their inbox and that it is ham.
This should work, as long as the user does not delete messages from the auto-detected folder in an effort to be tidy. We could have a script that also checked the deleted items folder to see if this was the case, and emptied messages from the auto-detected folder after they were a week old, so that users would have no need to remove them themselves.
So, here's the question. What do you all think? Does anyone have a better idea? Has this been done before?
