Existing deep learning approaches have achieved high performance in encrypted network traffic analysis tasks. However, some realistic scenarios, such as open-set recognition on dynamically changing tasks, challenge previous methods. Classic few-shot learning methods are used widely for these tasks in certain domains, such as computer vision and natural language processing. Nonetheless, few-shot open-set recognition for encrypted network traffic is still an unexplored area. This paper proposes a probability based task adaptive Siamese open-set recognition model for encrypted network traffic classification. Our contributions are threefold: First, we introduce generated positive and negative pairs into the Siamese Neural Network training process to shape a more precise similarity boundary through bidirectional dropout data augmentation. Second, we utilize Dirichlet Process Gaussian Mixture Model (DPGMM) distribution to fit the similarity scores of the negative pairs constructed by the support set of each query task, and create a new open- set recognition metric. Third, by leveraging the extracted features from coarse and fine-granular levels, we construct a hierarchical cross entropy loss to improve the confidence of the similarity score. Extensive experiments on a network traffic dataset and the Omniglot dataset demonstrate the superiority of our proposed approaches, which can respectively obtain up to 4.5% and 1.2% performance gain in terms of accuracy as well as 4.0% and 1.8% in terms of area under the receiver operating characteristic (AUROC).
Abstract: The vast majority of Internet traffic is now end-to-end encrypted, and while encryption provides user privacy and security, it has made network surveillance an impossible task. Various parties are using this limitation to distribute problematic content such as fake news, copy-righted material, and propaganda videos. Recent advances in machine learning techniques have shown great promise in extracting content fingerprints from encrypted traffic captured at the various points in IP core networks. Nonetheless, content fingerprinting from listening to encrypted wireless traffic remains a challenging task due to the difficulty in distinguishing re-transmissions and multiple flows on the same link. In this paper, we show the potential of fingerprinting internet traffic by passively sniffing WiFi frames in air, without connecting to the WiFi network by leveraging deep learning methods. First, we show the possibility of building a generic traffic classifier using a hierarchical approach that is able to identity most common traffic types in the Internet and reveal fine-granular details such as identifying the exact content of the traffic. Second, we demonstrate the possibility of using Multi-Layer Perceptron (MLP) and Recurrent Neural Networks (RNNs) to identify streaming traffic, such as video and music, from a closed set, by sniffing WiFi traffic that is encrypted at both Media Access Control (MAC) and Transport layers. Overall, our results demonstrate that we can achieve over 95% accuracy in identifying traffic types such as web, video streaming, and audio streaming as well as identifying the exact content consumed by the user.
Abstract: Video streaming traffic has been dominating the global network and the challenges have exacerbated with the gaining popularity of interactive videos, a.k.a.360 videos, as they require more network resources. However, effective provision of network resources for video streaming traffic is problematic due to the inability to identify video traffic flows through the network because of end-to-end encryption. Despite the promise given for network security and privacy, end-to-end encryption also provides a shield for adversaries. To this end, encrypted traffic classification and content fingerprinting with advanced Machine Learning (ML) methods have been proposed. Nevertheless, achieving high performance requires a significant amount of training data, which is a challenging task in operational networks due to the sheer volume of traffic and privacy concerns. As a solution, in this paper, we propose a novel Generative Adversarial Network (GAN) based data generation solution to synthesize video streaming data for two different tasks, 360/normal video classification and video fingerprinting. The solution consists of a percentile-based data mapping mechanism to enhance the data generation process, which is further supported by novel algorithms for data pre-processing and GAN model training. Taking over 6600 actual video traces and generating over 150,000 new traces, our ML-based traffic classification results show a 5–16% of accuracy improvement in both tasks.
Abstract: The Square Kilometre Array (SKA) Low is a next generation radio telescope, consisting of 512 antenna stations spread over 65 km, to be built in Western Australia. The Correlator and BeamFormer (CBF) design is central to the telescope signal processing. CBF receives 6 Tera-bits-per-second (Tbps) of station data continuously and processes it in real time with a compute load of 2 peta-operations-per-second (Pops). The correlator calculates up to 22 million cross products between all pairs of stations, while the beamformers coherently sum station data to form more than 500 beams. The output of the correlator is up to 7 Tbps, and the beamformer 2 Tbps. The design philosophy, called “Atomic COTS”, is based on commercial-off-the-shelf (COTS) hardware. Data routing is implemented in network switches programmed using the P4 language and the signal processing occurs in COTS FPGA cards. The P4 language allows routing to be determined from the metadata in the Ethernet packets from the stations. That is, metadata describing the contents of the packet determines the routing. Each FPGA card inputs a fraction of the overall bandwidth for all stations and then implements the processing needed to generate complete science data products. Generation of complete science products in a single FPGA is named here as Atomic processing. A Tango distributed control system configures the multitude of processing modes as well as maintaining the overall health of the CBF system hardware. The resulting 6 Tbps in and 9 Tbps out, 2 Pops Atomic COTS network attached accelerator occupies five racks and consumes 60 kW.
HTTPS encrypted traffic flows leak information on underlying contents through various statistical properties such as packet lengths and timing, enabling traffic fingerprinting attacks. Recent traffic fingerprinting attacks leveraged Convolutional Neural Networks (CNNs) to record very high accuracies undermining state-of-the-art defenses. In this paper, we analyze such CNNs to understand their inner workings which helps in building efficient traffic classifiers and effective defenses. First, we experiment on three datasets and show that website fingerprinting CNNs focus majorly on transitions between uploads and downloads in trace fronts while video fingerprinting CNNs focus more on finer shapes of periodic bursts. Next, we show that traffic fingerprinting CNNs exhibit transfer learning capabilities allowing identification of new websites with fewer data. We also demonstrate how traffic fingerprinting CNNs outperform Recurrent Neural Networks (RNNs) due to their resilience to random shifts in data, which is common in network traces. We further generalize these observations on other publicly available network traffic datasets. Leveraging our observations, we propose two new defenses against traffic fingerprinting. Our first defense FRONT-U, defends website visits by obfuscating transitions between uploads and downloads in trace fronts and provides similar privacy as the state-of-the-art defense FRONT, with half the data overhead. Our second defense STOMA, defends streaming traffic by obfuscating the finer sub-bursts within major bursts of a trace using only the nextfew seconds as opposed to using the entire trace as in the state-of-the-art.
In the last few years, Input/Output (I/O) bandwidth limitation of legacy computer architectures forced us to reconsider where and how to store and compute data across a large range of applications. This shift has been made possible with the concurrent development of both smart NICs and programmable switches with a common programming language(P4), and the advent of attached High Bandwidth Memory within smartNICs/FPGAs. Recently, proposals to use this kind of technology have emerged to tackle computer science related issues such as fast consensus algorithm in the net-work, network accelerated key-value stores, machine learn-ing, or data-center data aggregation. In this paper, we intro-duce a novel architecture that leverages these advancements to potentially accelerate and improve the processing of radio-astronomy Digital Signal Processing (DSP), such as correlators or beamformers, at unprecedented continuous rates inwhat we have called the “Atomic COTS” design. We givean overview of this new type of architecture to accelerate digital signal processing, leveraging programmable switches and HBM capable FPGAs. We also discuss how to handle radio astronomy data streams to pre-process this stream ofdata for astronomy science products such as pulsar timingand search. Finally, we illustrate, using a proof of concept,how we can process emulated data from the Square Kilometer Array(SKA) project to time pulsars.
Traffic fingerprinting and developing defenses against them has always been an arms race between the attackers and the defenders. The rapid evolution of deep learning methods makes developing stronger traffic fingerprinting models much easier, while overhead, latency, and deployment constraints restrict the abilities of the defenses. As such, there is always the need of coming up with novel defenses against traffic fingerprinting. In this paper, we propose SMAUG, a novel CGAN-based (Conditional Generative Adversarial Network) defense to protect video streaming traffic against fingerprinting. We first assess the performance of various GANs in video streaming traffic synthesis using multiple GAN quality metrics and show that CGAN outperforms other types of GANs such as basic GANs and WGANs (Wasserstein GAN). Our proposed defense, SMAUG, uses CGANs to synthesize video traffic flows and use those synthesized flows to camouflage the original traffic that needs protection. We compare SMAUG with other state-of-the-art defenses – FPA and d*-private methods, as well as a kernel density estimation-based baseline and show that SMAUG provides better privacy with lower overhead and delay.