SPAM AnalysisJanuary 29, 2014 - Author: Mike Bosland
I received a few viruses in my inbox during the holiday season and naturally, I downloaded and ran this malware to see what it’s purpose was. What did the authors want to accomplish? How are they planning on achieving said goal? Who are they? Where are they? So now I have a malware analysis lab set up and I am learning how to reverse engineer computer viruses, root kits and other types of malware.
During a particularly slow day of Jury Duty, I wanted to continue researching this malware, but being in a government building, I decided that’s probably a bad idea. So I downloaded the contents of my oldest email accounts spam folder. This is my sacrificial lamb account. I use it whenever I need to sign up for something or I have an feeling my email will be sold to the highest bidder. Naturally, it gets a lot spam, and a lot of attachments.
Armed with only the most generalized and probably offensive thoughts about spam I dove in and mined this data for trends and any other tidbits I could find. Here’s what I found:
The spam folder was analyzed. It contained messages from 12/14/13 through 1/14/14. During this 1 month period 151 spam messages were received. These were exported and analyzed using open source software, open source data and custom python scripts to tie it all together. As expected, the majority of spam messages resolved to China. 3 Servers in China. That is the cool part of all this research. 151 messages, on the surface all unique, linked to the same 3 servers in China.
Email messages were parsed using custom python scripts and the python open source mailbox libraries. The links were harvested with custom python scripts and queried against the various WHOIS databases online to identify an IPv4 address for each. These IPv4 addresses were geolocated using the pygeoip libraries and the GeoLiteCity database. Graphs were generated from custom python scripts using matlib libraries and Microsoft Excel.
Throughout the research here, one assumption used is that the destination IP and especially country is the final destination intended by the author of the spam message. However, it is possible some of these destinations are in fact intermediaries or compromised servers pulling malicious content from yet more servers or redirecting the user to a different location.
Where did the spam come from?
The spam messages were sent from a variety of locations around the world.
Using the emails ‘X-Originating-IP’ field, I determined the geolocation of the address sending the spam. This does not mean these are spammer IP address; these could be compromised machines, spoofed IP addresses or a variety of other possibilities as well.
Each message came from a unique IP address. This makes me think many of these might be spoofed. Here is the breakdown by country of origin. The Other category includes: Belarus, Canada, Japan, Malaysia, Philippines & Taiwan. Each of these had an individual percentage below 1% of the total Spam sent.
If you were to geolocate each of the servers and plot it on a world map…you’d get this:
Attachments are a big concern, so we’ll discuss them first. Thanks to advancements in ant-spam, anti-virus technologies, most of these spam messages did NOT contain attachments. There 1 was message which did. The attachment was a virus sent from a Taiwan IP address. This virus is being researched and will be reported separately.
The other 150 messages contained links of some sort. These have not yet been determined if they are malicious or not. More research is required here.
The 151 Spam emails contained a total of 1538 individual links. If we look at the various linked domains and see where the DNS servers resolve them, we can attribute them to countries and individual machines.
Here is the breakdown by country of destination. In other words, if you click the link, the computer you are accessing is located in that country. The Unknown category includes links or domains that I was not able to verify.
Like wise, if you plot these destination servers on a map…
Interestingly enough, while there are only 3 servers in China, the majority of spam links (69.3%), linked to these machines through a variety of domain names.
As mentioned before, most of the spam resolved to 3 chinese Servers. Server1, Server2 and Server3 for the purposes of this blog post (while I figure out the legality of posting IP addresses…)
Based on the similarity and sequential nature of the IP Addresses, it appears the first 2 are most likely the same individual, group or organization, however more research is required. These 2 machines are located in Beijing. The third is located in China but more specific than that is not clear. Multiple domains resolved to these servers. It is possible this is a virtual hosting provider using 1 IP for a variety of domains, but this requires more research.
25 Individual domains resolved to this server. 98 of the 151 Spam messages linked to this machine.
These 26 domains link to Server2. 98 of the 151 spam messages linked to this machine.
Most are duplicates of those linking to Server1 leading further evidence to suggest these are machines owned and/or operated by the same individual, group or organization. The highlighted domain is uniquely tied to this machine. Based on the domain name this could be the main web hosting server. Again more research is needed.
Only 3 domains resolved to this IP. They seem to be related. 3 of the 151 messages linked to this machine.
This section details the routing the spam messages are supporting. The originating country is the country from which the spam message was sent. The Destination Country is which country the server linked to by the spam message resides in.
|Originating Country||Destination Country||Number of Links|
|Korea, Republic Of||China||422|
|United States||United States||108|
|South Africa||United States||65|
|Russian Federation||Korea, Republic Of||2|
|United States||Czech Republic||1|
|United States||Hong Kong||1|
As mentioned above, only 1 of these spam messages contained an attachment. This was indeed malware. Preliminary analysis seems to point to this being a downloader and clickjacker. Detailed analysis is underway.
Research is underway to determine which, if any, of the 1538 links contained in these messages are malicious. I’m researching the client side honeypot ‘thug’ to “beat up the hackers and steal their malware”. Awesome stuff.
Subject Line Analysis
Email subjects in the spam messages can be easily assigned to 2 main categories. There are those that are standard plain text and those that are base64 encoded. This encoding could be to support icons in the subject line or a way to evade anti-spam filters.
Spam messages were categorized subjectively into the following categories:
- Drugs – Messages selling or describing drugs
- False Notification – Messages alerting that you won something, have a message waiting or are required to perform some action
- News – Messages providing links to breaking news stories
- Not Spam – Messages that should not have been in the spam folder
- Porn – Messages pushing, providing or otherwise involving sex or unsolicited online dating services
- Save Money – Messages promising to save you money
- Traditional Looking Ad – Messages that resemble traditional, well intentioned, mass marketing campaigns.
Here is the frequency of each category:
Coming into this research, I expected pornography and drug related spam messages to be the most frequent, however it is clear from this sample that traditional advertising themed spam messages are the most common.
Perhaps this is due to anti-spam filters concentrating on key words related to the pornography and drug themes. Or maybe spammers are getting craftier and making the emails more convincing. Of course, it could always be that these are legitimate ads being spammed.
Source and Destinations
The majority of the email messages were sent from The Republic of Korea (33%) and China (42%). Most of those messages link back to China (61%) and the United States (38%). To me this suggests Korean systems are used to send spam messages (infected/hacked machines?) that link to servers hosted in the US or China.
A majority of messages were sent from one country and linked to servers in other countries. These messages are suspicious but not definitely malicious. The variety of international machines involved, along with political issues related to law enforcement cooperation, provide shelter to criminals attempting to scam users.
As most messages resolved to 2 servers in China, further research is needed to determine what these computers are doing. Are they a virtual hosting provider; hosting multiple coincidentally suspicious domains from the same IP? Are they criminal servers using a variety of techniques to lure in victims to the same destination? The similarity in IP address makes me think they are more likely a shared hosting service, potentially a load balancing server as they share a multitude of domains? However, more research is required to verify this.
Encoded Subject Lines
When I noticed the encrypted subject line, my initial thought was these are malicious emails designed to evade anti-spam filters. However, after a little bit of research I found that this is a common way to support emoticons or icons in subject lines.
I think this data supports expected findings. The most interesting findings, in my opinion, were most spam messages resolving to 2 related servers in Beijing, China and that pornography related spam messages were not the most prevalent. Traditional looking ads are the most common, followed by false alerts.
Congrats if you made it this far…you’re just as geeky as I am…