Pcapan: a PCAP analysis helper

Written by Simon Marechal - 22/11/2023 - in Outils , Reverse-engineering - Download

This post showcases a small but very useful tool that can be used to classify expected and suspicious traffic in a network capture file, and, more importantly, what the process is for writing such a tool.

I recently had to analyze traffic from and to an Android that was suspected to having been compromised. I started by capturing all WiFi traffic for a few days on the wireless router. It amounted to a few hundreds of megabytes stored in a PCAP file. In order to make sure I captured all traffic, I did turn off cellular data, hoping that if a malware was present, it would not wait to be on cellular data to communicate.

I decided to list all unique IP addresses that have been contacted by the phone, and to sort them between expected, suspicious and malicious. In order to do that, I used tcpdump and a few command line utilities to find all unique IPs. It turns out that even with a brand new phone, with no extra applications installed, the bundled applications alone will contact tens of distinct IP addresses. In my capture, there were hundreds of them. Some automation was needed.

First try at automated classification

But how to classify IP addresses? By just having an IP address, there are two sources of information that are easy to query: reverse DNS look-ups and the WHOIS registry.

Reverse DNS look-ups are controlled by the owner of a given IP block, so they are potentially misleading. I quickly gave it a try and it turned out that all answers where either failing (because no name was associated with the IP, or because the query timed out) or unhelpful. For example, one such captured IP was 172.217.169.165. Reverse DNS tells us it is sof02s33-in-f5.1e100.net, most likely a Google service. However, as we will discuss later, it was not obvious to me at that time if that meant that it hosted a Google service, or if it was a server in their customer cloud.

The other source, the WHOIS registry, is a bit more useful. While the quality of this registry is dependent on how strict the registrar is, I thought I would trust it as a first approximation. For example, one unknown IP was 125.209.192.43. Querying the registry tells:

inetnum:        125.209.192.0 - 125.209.255.255
netname:        NBP-NET
descr:          NAVER Cloud Corp.
country:        KR
admin-c:        IM681-AP
tech-c:         IM681-AP
status:         ALLOCATED PORTABLE
mnt-by:         MNT-KRNIC-AP
mnt-irt:        IRT-KRNIC-KR
last-modified:  2021-03-16T00:14:44Z
source:         APNIC

So, what is that Naver thing? After a bit of searching, it turned out that this Korean company owns the popular LINE instant messaging application, which is installed on the phone. That makes it possible to whitelist the whole IP block. I started with a small Python script that would read the PCAP file and a configuration file. The configuration file contains whitelisted blocks, and the script would extract all unique IPs and only prints those that are not whitelisted. The process was then to pick an unclassified IP, check it, and potentially add it to the list of whitelisted IPs.

Most of the time, however, the result is not as helpful:

Amazon.com, Inc. AMAZO-4 (NET-108-128-0-0-1) 108.128.0.0 - 108.139.255.255

This is not helpful, because we only know it is something at Amazon, most likely in their public cloud, so it can be anything really. In this situation, I would look at the capture using Wireshark, and manually inspect the traffic. As most of the remaining traffic is encrypted, it is kind of hard to know what is being exchanged. However, most of the time, by looking at the traffic in Wireshark, it is possible to find in the TLS Hello packet the following:

A Wireshark capture with the TLS server name extension highlighted

This means that the client application wanted to connect to mail.google.com. A few packets before, there also was a DNS query to that domain, with this IP as a response. For most connection, we have good information about the name of the service the phone wanted to contact, using either:

  • the SNI extension content, which may be manufactured, but is easy to link to an IP;
  • the DNS request, which should be truthful under the assumption the DNS provider is not itself compromised, but that happens before the actual connection and requires a bit more work to link to the IP.

Note that the SNI information is bound to be unavailable in the future thanks to privacy preserving protocols such as ECH (encrypted client hello). Manually doing that for all IP addresses is too time consuming, so I started adding a TLS parser to the Python script. After a few unsuccessful tries, I ported it to Rust, and a few hours later I had a tool that I used to complete the engagement.

What now?

A few hours later, we have a program that reuses code from PCAP, TLS parsing, DNS parsing, HTTP parsing libraries. It is a small program, but it does quite a few things that makes analyzing easier, and might be of use to others. It does parse PCAP files, and, doing so:

  • parses the TLS handshake, collects server name indications and associates them to IPs,
  • parses HTTP requests, collects the Host fields and associates them to IPs,
  • parses DNS requests, collects the query/responses for A fields and associates them to IPs.

The configuration file looks like:

---
dns:
  8.8.8.8: main
  192.168.52.1: local

allow:
  125.209.192.0/18: line
  31.13.64.0/18: whatsapp
  157.240.0.0/16: whatsapp
  163.70.128.60: whatsapp
  216.239.35.0: google time server
  71.18.0.0/16: bytedance (tiktok)

okhosts:
  - twitter.com
  - x.com

oksuffixes:
  - .amazon.com
  - .byteeffecttos-g.com
  - .deepl.com
  - .fbcdn.net
  - .framapiaf.org
  - .cdninstagram.com

su:
  3.0.0.0/9: amazon
  • The dns entry allows listing, as expected, DNS servers. They are used for DNS packet inspection.
  • The allow entry contains IP blocks that are considered safe, and do not warrant further inspection.
  • The okhosts entry lists all hosts that are whitelisted, and the oksuffixes entry lists whitelisted host name suffixes.
  • Finally the su entry allows annotating some IP blocks with custom information.

Usage looks like:

$  cargo run -- --pcap /tmp/log.tcpdump --google -w whitelist.yaml
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/pcapan --pcap /tmp/log.tcpdump --google -w whitelist.yaml`
loading google networks reference
loading google cloud reference
packet 10321, can't parse as DNS: query type 65 is invalid
23.63.240.185: ["SNI/p16-sign.tiktokcdn-us.com"] sz=33303
34.249.108.204: ["SNI/api.axept.io"] sz=9802
96.16.248.15: ?? {80} sz=380 pkt=84510
96.16.248.32: ?? {80} sz=380 pkt=2817
104.244.42.66: ?? {443} sz=5583 pkt=3
108.138.25.249: ?? {443} sz=13133 pkt=4
188.166.203.108: ["SNI/www.canardpc.com", "DNS/www.canardpc.com"] sz=10691
199.103.24.8: ?? {443} sz=60 pkt=84639

The --cutoff 100 option only displays IPs with which more than 100 bytes have been exchanged. The --google option downloads the Google network and cloud IP ranges, and incorporates them to the configured allow list and annotated list (configured with the -w whitelist.yaml option).

As can be seen in the example output, the DNS parser doesn't currently handle some newest extensions to the protocol, but it doesn't really matter for now. The remaining unidentified connections are:

  • Connections that happen early in the capture, meaning we could not capture a related DNS query, or the TLS hello packets. They can be identified with the pkt value which display the first packet number.
  • A pair of TCP connections with no data exchanged (the sz field, which is the amount of IP traffic, is equal to 380). Another connection has only 60 bytes exchanged, which is an unique SYN packet. The remaining are properly identified by SNI record, and sometimes DNS queries.

Testing on a challenge file

For illustration, I downloaded the July challenge from this repository. Running the tool for the first time, using the default configuration file with DNS removed:

$ cargo run -- --pcap ~/Downloads/2023-07-Unit42-Wireshark-quiz.pcap --whitelist whitelist.yaml --cutoff 1
0.0.0.0: ?? {68} sz=370 pkt=1
13.66.14.83: ["SNI/sat09prdapp01-canary-opaph.netmon.azure.com"] sz=5741
13.107.6.163: ["SNI/a6d04e539d712e4ef920661af4825316.clo.footprintdns.com"] sz=6365
92.118.151.9: ["DNS/guiatelefonos.com", "HttpHost/guiatelefonos.com", "SNI/guiatelefonos.com"] sz=367883
152.199.24.163: ["SNI/static-ecst.licdn.com"] sz=5989
194.26.135.119: ?? {12432} sz=581061 pkt=1697
195.161.114.3: ["HttpHost/623start.site", "DNS/623start.site"] sz=1202

For information, there are 28 distinct IP endpoints in the capture file, but most of them are being whitelisted, and this is why we only see seven of them here. Because this is a challenge, we have unencrypted HTTP sessions, which means host information could be extracted. Looking at the actual HTTP sessions, one can find fishy traffic, that looks like a malware being downloaded:

a screenshot from Wireshark

The weird other connection is the one that doesn't have DNS or SNI information, and uses a strange port. Looking at the exchanged traffic, this looks like data exfiltration:

capture of network traffic, probably data exfiltration

This is obviously a challenge designed to be easily solvable, but this really only took two minutes to solve it using this simple tool and Wireshark.

Conclusion

This post describes a simple methodology for quickly sifting through a large PCAP file an singling out suspicious traffic. It also comes with a small but capable tool that can serve as an example of a quick PCAP parser. Do not hesitate experimenting with it and giving us feedback!