Existing deep learning approaches have achieved high performance in encrypted network traffic analysis tasks. However, some realistic scenarios, such as open-set recognition on dynamically changing tasks, challenge previous methods. Classic few-shot learning methods are used widely for these tasks in certain domains, such as computer vision and natural language processing. Nonetheless, few-shot open-set recognition for encrypted network traffic is still an unexplored area. This paper proposes a probability based task adaptive Siamese open-set recognition model for encrypted network traffic classification. Our contributions are threefold: First, we introduce generated positive and negative pairs into the Siamese Neural Network training process to shape a more precise similarity boundary through bidirectional dropout data augmentation. Second, we utilize Dirichlet Process Gaussian Mixture Model (DPGMM) distribution to fit the similarity scores of the negative pairs constructed by the support set of each query task, and create a new open- set recognition metric. Third, by leveraging the extracted features from coarse and fine-granular levels, we construct a hierarchical cross entropy loss to improve the confidence of the similarity score. Extensive experiments on a network traffic dataset and the Omniglot dataset demonstrate the superiority of our proposed approaches, which can respectively obtain up to 4.5% and 1.2% performance gain in terms of accuracy as well as 4.0% and 1.8% in terms of area under the receiver operating characteristic (AUROC).
Abstract: The vast majority of Internet traffic is now end-to-end encrypted, and while encryption provides user privacy and security, it has made network surveillance an impossible task. Various parties are using this limitation to distribute problematic content such as fake news, copy-righted material, and propaganda videos. Recent advances in machine learning techniques have shown great promise in extracting content fingerprints from encrypted traffic captured at the various points in IP core networks. Nonetheless, content fingerprinting from listening to encrypted wireless traffic remains a challenging task due to the difficulty in distinguishing re-transmissions and multiple flows on the same link. In this paper, we show the potential of fingerprinting internet traffic by passively sniffing WiFi frames in air, without connecting to the WiFi network by leveraging deep learning methods. First, we show the possibility of building a generic traffic classifier using a hierarchical approach that is able to identity most common traffic types in the Internet and reveal fine-granular details such as identifying the exact content of the traffic. Second, we demonstrate the possibility of using Multi-Layer Perceptron (MLP) and Recurrent Neural Networks (RNNs) to identify streaming traffic, such as video and music, from a closed set, by sniffing WiFi traffic that is encrypted at both Media Access Control (MAC) and Transport layers. Overall, our results demonstrate that we can achieve over 95% accuracy in identifying traffic types such as web, video streaming, and audio streaming as well as identifying the exact content consumed by the user.
Abstract: Video streaming traffic has been dominating the global network and the challenges have exacerbated with the gaining popularity of interactive videos, a.k.a.360 videos, as they require more network resources. However, effective provision of network resources for video streaming traffic is problematic due to the inability to identify video traffic flows through the network because of end-to-end encryption. Despite the promise given for network security and privacy, end-to-end encryption also provides a shield for adversaries. To this end, encrypted traffic classification and content fingerprinting with advanced Machine Learning (ML) methods have been proposed. Nevertheless, achieving high performance requires a significant amount of training data, which is a challenging task in operational networks due to the sheer volume of traffic and privacy concerns. As a solution, in this paper, we propose a novel Generative Adversarial Network (GAN) based data generation solution to synthesize video streaming data for two different tasks, 360/normal video classification and video fingerprinting. The solution consists of a percentile-based data mapping mechanism to enhance the data generation process, which is further supported by novel algorithms for data pre-processing and GAN model training. Taking over 6600 actual video traces and generating over 150,000 new traces, our ML-based traffic classification results show a 5–16% of accuracy improvement in both tasks.
Abstract: The Square Kilometre Array (SKA) Low is a next generation radio telescope, consisting of 512 antenna stations spread over 65 km, to be built in Western Australia. The Correlator and BeamFormer (CBF) design is central to the telescope signal processing. CBF receives 6 Tera-bits-per-second (Tbps) of station data continuously and processes it in real time with a compute load of 2 peta-operations-per-second (Pops). The correlator calculates up to 22 million cross products between all pairs of stations, while the beamformers coherently sum station data to form more than 500 beams. The output of the correlator is up to 7 Tbps, and the beamformer 2 Tbps. The design philosophy, called “Atomic COTS”, is based on commercial-off-the-shelf (COTS) hardware. Data routing is implemented in network switches programmed using the P4 language and the signal processing occurs in COTS FPGA cards. The P4 language allows routing to be determined from the metadata in the Ethernet packets from the stations. That is, metadata describing the contents of the packet determines the routing. Each FPGA card inputs a fraction of the overall bandwidth for all stations and then implements the processing needed to generate complete science data products. Generation of complete science products in a single FPGA is named here as Atomic processing. A Tango distributed control system configures the multitude of processing modes as well as maintaining the overall health of the CBF system hardware. The resulting 6 Tbps in and 9 Tbps out, 2 Pops Atomic COTS network attached accelerator occupies five racks and consumes 60 kW.
HTTPS encrypted traffic flows leak information on underlying contents through various statistical properties such as packet lengths and timing, enabling traffic fingerprinting attacks. Recent traffic fingerprinting attacks leveraged Convolutional Neural Networks (CNNs) to record very high accuracies undermining state-of-the-art defenses. In this paper, we analyze such CNNs to understand their inner workings which helps in building efficient traffic classifiers and effective defenses. First, we experiment on three datasets and show that website fingerprinting CNNs focus majorly on transitions between uploads and downloads in trace fronts while video fingerprinting CNNs focus more on finer shapes of periodic bursts. Next, we show that traffic fingerprinting CNNs exhibit transfer learning capabilities allowing identification of new websites with fewer data. We also demonstrate how traffic fingerprinting CNNs outperform Recurrent Neural Networks (RNNs) due to their resilience to random shifts in data, which is common in network traces. We further generalize these observations on other publicly available network traffic datasets. Leveraging our observations, we propose two new defenses against traffic fingerprinting. Our first defense FRONT-U, defends website visits by obfuscating transitions between uploads and downloads in trace fronts and provides similar privacy as the state-of-the-art defense FRONT, with half the data overhead. Our second defense STOMA, defends streaming traffic by obfuscating the finer sub-bursts within major bursts of a trace using only the nextfew seconds as opposed to using the entire trace as in the state-of-the-art.