andrew@theinternet

NetSec Lecture Notes - Lesson 14 - Botnet Detection

Botnet Detection

  • Rules of engagement have changed
  • In the past, enemies were easy to distinguish
  • New technology such as botnets and APTs have changed this dynamic

Network Monitoring

  • Attack traffic used to be well-defined and obvious e.g.:
    • Payload contains exploit toa a known vulnerability
    • Volume/rate suggests DoS, Spam, etc.
  • Firewalls and network intrusion detection systems
    • Designed to identify attack traffic

Advanced Network Monitoring

  • Traditional firewall/NIDS
    • Are bypassed by mobile devices compromised while outside network perimiter
  • Attack traffic is now very subtle
    • e.g. botnet HTTP-based command and control traffic looks like normal web traffic
  • Need more advanced network monitoring
    • Identify traffic beyond obvious exploit/attacks
    • In particular, botnet detection systems

Bot Quiz

Fill in the blanks with the correct answers A Bot is often called a zombie because it is a compromised computer controlled by malware without the consent and knowledge of the user.

Botnet Quiz

Fill in the blanks with the correct answers A botnet is a network of bots controlled by a Bot Master.

  • More precisely, a coordinated group of malware instances that are controlled via command and control channels. C&C architectures include centralized (e.g. IRC, HTTP) and distributed (e.g. P2P)

It is a key platform for fraud and other for-profit exploits.

Botnet Tasks Quiz

Select all the tasks that botnets commonly perform

  • More than 95% of all spam
    • True
  • All distributed denial of service attacks
    • True
  • Click fraud
    • True
  • Phishing and pharming attacks
    • True
  • Key logging and data/identity theft
    • True
  • Distributing other malware, e.g. spyware
    • True
  • Anonymized terrorist and criminal communication
    • True

Why Traditional Security Measures Fail

  • Traditional Anti-Virus Tools
    • Bots use packer, rootkit, frequent updating to easily defeat AV tools
  • Traditional IDS/IPS
    • Look at only specific aspects
      • e.g. poayload with exploit
    • Do not have a big picture
      • Bots are for long-term use
    • img
  • Honeypot/Honeynet
    • Not scalable, mostly passively waiting
    • Bots can detect/discover honeypot/honeynet
    • Because it is usually a single host, it is not a good botnet detection tool

Botnet Detection: Challenges

  • Bots are stealthy on the infected machines
    • e.g. rootkit hides the malware
  • Bot infection is usually a multi-faceted and multi-phased process
    • Only looking at one specific aspect likely to fail
  • Bots are dynamically evolving
    • Static and signature-based approaches may not be effective
  • Botnets can have very flexible designs of C2 channels
    • A solution very specific to a given botnet instance is not desirable

Botnet Detection: Guidelines

  • Distinguish botnet activities from normal network traffic
    • Bot: non-human
    • Net: bots are connected, activities are coordinated
  • Distinguis botnets from other (older) attacks
    • For profit (hosts used as resources)
    • Long-term use (updates to the malware)
    • Net (coordination)

Botnet Detection: Enterprise Networks

  • Deploy detection at gateway or edge router
  • Vertical correlation
    • Looking for correlated events across a time horizon given that a bot has multiple activities in its life cycle
    • e.g. BotHunter
  • Horizontal correlation
    • Looking for similar, or correlated, behaviors across multiple bots
    • e.g. Botsniffer, BotMiner
  • Cause-Effect correlation
    • Inject traffic to elicit a response from a bot, to confirm that traffic is generated by bots vs by humans

BotHunter

  • Vertical dialog correlation
  • Example: Phatbot
    • img
  • Egress point (internal or external)
  • Search for duplex communication sequences that map to infection lifecycle model
  • Stimulus does not require strict ordering, but does require temporal locality
  • Characteristics of Bot declarations
    • External stimulus alone cannot trigger bot alert
    • 2x internal bot behaviors triggers bot alert
    • img

Architecture

img

  • SCADE
    • Statistical Scan Anomaly Detection Engine
      • Custom malware specific weighted scan detection system for inbound and outbound sources
      • Bounded memory usage to the number of inside hosts, less vulnerable to DoS attacks
    • Inbound (E1: Initial Scan Phase):
      • suspicious port scan detection using weighted score
      • fialed connection to vulnerable port = high weight
      • fialed connection to other port = low weight
    • Outbound (E5: Victim Outbound Scan Phase):
      • S1 - Scan rate of V over time t
      • S2 - Scan failed connection rate (weighted) of V over t
      • S3 - Scan target entropy (low revisit rate implies bot search) over t
      • Combine model assessments: Or, Majority voting, AND scheme
  • SLADE
    • Statistical pay Load Anomaly Detection Engine
      • Suspicious payload detection: new “lossy” n-gram byte distribution analysis over a limited set of network services
      • Implements a lossy data structure to capture 4-gram hash space: default vector size = 2048 (versus n=4, 2564 = 232 = 4Gb)
      • Comparable accuracy as full n-gram scheme: low FP and FN
      • General performance comparable to PAYL (Wang 2004): to detect all 18 attacks, the false positive of PAYL is 4.02%, SLADE is 0.3601%
  • Signature Engine
    • Signature set
      • Replaces all standard snort rules with five custom rulesets
    • Scope: Dialog content
      • Known worm/bot exploit signatures, shell/code/script exploits, malware update/download, C&C command exchanges, outbound scans
    • Rule sources
      • Bleeding edge malware rule sets
      • Snort community rules
      • Cyber-TA custom bot-specific rules

BotMiner

  • Why do we need another?
    • BotHunter based on specific infection lifecycles.
    • Botnets can change their C&C content, protocols, structures, C&C servers, infection models, etc.
  • Goal is to have a botnet detection system that is procotol and structure-independent
  • Use both vertical and horizontal correlation
    • Bots are for long-term use
    • Botnet: communication and activities are coordinated/similar

Architecture

img

  • C-Plane is for monitoring C&C traffic
    • img
    • Temporal related statistical distribution information in
      • Bytes per second
      • Flows per hour
    • Spatial related statistical distribution information in
      • Bytes per packet
      • Packets per flow
    • Clustering of C-flows done in two steps. Coarse-grained and then fine-grained
      • Why? Efficiency
      • Coarse-graned: Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPC for each C-flow (2 * 4 = 8)
        • Efficient clustering algorithm: X-means
      • Fine-grained clustering: Using full feature space (13 * 4 = 52)
    • A-Plane is for monitoring malicious activities
      • img
      • Capture “similar activity patterns”
  • Cross-plane correlation is for detecting machines that behave similarly in both control plane and malicious activities
    • Cross-check the clustering results of A-plane (activities) and C-plane (communications)
      • Intersections provide stronger evidence that a host is in a botnet: similar malicious activities AND C&C patterns
      • The more intersections a host falls into the stronger evidence that it is a bot
    • Clustering bots into the same botnet
      • If two hosts appear inthe same activity clusters and in at least one common C-cluster, they should be clustered together (as the same botnet)

Botnet Detection Quiz

Which of these behaviors are indicative of botnets?

  • Linking to an established C&C server
    • True
  • Generating IRC traffic using a specific range of ports
    • True
  • Generating DNS requests
    • False
    • Generating DNS requests is not suspicious behavior. Generating SIMULTANEOUS IDENTICAL DNS requests is suspicious
  • Generating SMTP emails/traffic
    • True
  • Reducing workstation performance/Internet access to the level that it is noticeable to users
    • True

BotMiner Limitations Quiz

What can botnets do to avoid C-plane clustering?

  • Manipulate communication patterns
  • Introduce noise (in the form of random packets) to reduce similarity between C&C flows.

What can botnets do to avoid A-plane monitoring?

  • Perform slow spamming
  • Use undetectable activities (spam sent with Gmail, download exe from HTTPS server)

Botnet Detection on the Internet

  • A botnet must use Internet protocols or services for efficiency, robustness, and stealth
    • Look-up services (e.g. DNS, P2P DHT)
      • Find C&C servers and/or peers
    • Hosting services (Web servers and proxies)
      • Storage and distribution/exchange of attack-related data, malware download
    • Transport (e.g. BGP)
      • Route (or hide) attack from bots to victims
    • Identify the abnormal use of Internet services that suggests botnet activities
    • This lesson will focus on DNS
      • USed by most bots for locating C&C and hosting sites

Botnet Use of Dynamic DNS Services

img

  • Dynamic DNS providers preferred by BotMasters because they allow frequent changes of DNS mapping, allowing rotation of IP addresses as each one gets shot down
  • Detecting bot DNS traffic can be done probabilistically
    • e.g. a botnet DNS is looked up by hundreds of thousands of machines accross the internet, but is unknown to Google search. This is a strong signal / anomaly
  • If a domain is identified as being used for a botnet, a Dnstop alert can be issued. DynDNS updates CName to point to GT sinkhole

How else can this be detected?

  • Observation 1: hard-coded C&C domain (string)
    • Domain name purchases use traceable financial information, so there is an incentive to get as much mileage out of a 2LD as possible. Multiple 3LDs can use DDNS service with one package deal.
    • Thus: financial and stealthy motives for botnet authors to “reuse” 2LD with numerous similar/clustered 3LDs.
    • Clustered 3LD look-ups
      • Cluster the 3LDs under a SLD based on their similarities on names, and subnets of resolved IPs
      • Sum up the look-ups to all domains within a cluster. Botnet domains tend to have waaaay higher totals when compared to normal traffic
    • img
  • Observation 2: DNS look-up behavior of botnets
    • After boot, bots immediately resolve their C&C
      • Exponential arrival (spike) of bot DNS requests, because of time zones, at 9am/5pm schedules, etc.
    • Normal DNS lookup behavior is a lot smoother
      • Human users don’t all immediately check the same server right after boot
  • Other observations/features
    • Source IP dispersion in DNS look-ups
      • Local or global popularity of the domain
    • Resolved IP dispersion
      • Distributed in many different networks?
    • Number of times resolved IP changed

### Botnet DynDNS Detection at a Large ISP

  • Recursive DNS Monitoring at ISP
  • Analyze DNS traffic from internal hosts to a recursive DNS server(s) of the network
  • Detect abnormal patterns/growth of “popularity” of a domain name
    • Identify botnet C&C domain and bots
    • Monitor for “new and suspicious” domain names that enjoy exponential or linear growth of interests/look-ups
      • Train a Bloom filter for N days to reccord domain names being looked-up, and a Markove model of all the domain name strings
        • On the N+1 day, consider a name “new” if it is not in the Bloom filter, and if it does not fit the Markov model, it is also “suspicious”
      • Treat the sequence of look-ups to each new and suspicious domain (on the N+1 day) as a time series
      • Apply linear and exponential regression techniques to analyze the growth of number of look-ups
        • A match of the growth patterns suggests botnet domains
  • Common means of botnet propagation: (worm-like) exploit-based, email-based, and drive-by egg download
    • Exploit-based propagation
      • The number of infected machines grows exponentially in the initial phase
    • Email-based propagation
      • Exponential or linear
    • Drive-by egg download
      • Likely sublinear
  • Anomalous domain names
    • Botnet-related domains usually contain random-looking (sub)strings
    • Many/most sensible domain names have been registered (for legitimate use)
    • In particular, botnet domain name 3LD often looks completely random, and the domain name tends to be very long

Detection of Targeted and Advanced Threats

  • Zero-day exploits, custom-built malware
  • Low-and-slow
  • Lateral movement
  • Need multi-faceted monitoring and analysis
    • Malware analysis
    • Host-based monitoring, forensics, recovery
    • Network monitoring
    • Internet monitoring, threat analysis, attribution

APT Quiz

Which of the information should be considered in order to identify the source of an APT attack?

  • Source IP address of TCP-based attack packets
  • Coding style of malware
  • Inclusion of special libraries with known authors
  • Motives of the attack
  • Language encoding
  • All of the above
    • True!