List

Abstract: The vast majority of Internet traffic is now end-to-end encrypted, and while encryption provides user privacy and security, it has made network surveillance an impossible task. Various parties are using this limitation to distribute problematic content such as fake news, copy-righted material, and propaganda videos. Recent advances in machine learning techniques have shown great promise in extracting content fingerprints from encrypted traffic captured at the various points in IP core networks. Nonetheless, content fingerprinting from listening to encrypted wireless traffic remains a challenging task due to the difficulty in distinguishing re-transmissions and multiple flows on the same link. In this paper, we show the potential of fingerprinting internet traffic by passively sniffing WiFi frames in air, without connecting to the WiFi network by leveraging deep learning methods. First, we show the possibility of building a generic traffic classifier using a hierarchical approach that is able to identity most common traffic types in the Internet and reveal fine-granular details such as identifying the exact content of the traffic. Second, we demonstrate the possibility of using Multi-Layer Perceptron (MLP) and Recurrent Neural Networks (RNNs) to identify streaming traffic, such as video and music, from a closed set, by sniffing WiFi traffic that is encrypted at both Media Access Control (MAC) and Transport layers. Overall, our results demonstrate that we can achieve over 95% accuracy in identifying traffic types such as web, video streaming, and audio streaming as well as identifying the exact content consumed by the user.