Chat with us, powered by LiveChat Reading complex email data, spam and ham - Essayabode

process a collection of email messages and create an R data frame of “derived” variables that give various measures of the email messages, e.g. the number of recipients to whom the mail was sent, the percentage of capital words in the body of the text, is the message a reply to another message. See below for a list of all the variables and also consider other variables you think might help help classify a message as SPAM versus HAM. The messages are in 5 different directories/folders. The name of the directory indicates whether the messages it contains are HAM or SPAM. There are 6,541 messages in total. This is a large amount of data.