A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store

Naveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake, Suranga Seneviratne, Guillaume Jourjon
IEEE Transaction on Mobile Computing
Publication year: 2020.07

Abstract: Counterfeit apps impersonate existing popular apps in attempts to misguide users to install them for various reasons such as collecting personal information or spreading malware. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation. To this end, this paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings so that counterfeit apps can be efficiently identified when they are submitted for publication. We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity such as SIFT, SURF, and various image hashing methods. We first evaluate the performance of the proposed method on two well-known datasets for evaluating image similarity methods and show that content, style, and combined embeddings increase precision@k and recall@k by 10%-15% and 12%-25%, respectively when retrieving five nearest neighbours. Second, specifically for the app counterfeit detection problem, combined content and style embeddings achieve 12% and 14% increase in precision@k and recall@k, respectively compared to the baseline methods. Third, we present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps. Under a conservative assumption, we were able to find 2,040 potential counterfeits that contain malware in a set of 49,608 apps that showed high similarity to one of the top-10,000 popular apps in Google Play Store. We also find 1,565 potential counterfeits asking for at least five additional dangerous permissions than the original app and 1,407 potential counterfeits having at least five extra third party advertisement libraries.

An SDN Perspective on Multi-connectivity and Seamless Flow Migration

S. Hatonen, T. I. ul Huque, A. Rao, G. Jourjon, V. Gramoli, S. Tarkoma
IEEE Networking Letters
Publication year: 2019.12
Abstract: Devices capable of multi-connectivity currently use static rules for selecting the set of interfaces to use. Such rules are limited in scope and can be counter-productive. We posit that SDN techniques can address this inefficiency. We present an approach that enables an SDN controller to manage the flows traversing the Ethernet, Wi-Fi, and LTE links in our laptop and also migrate the flows from one link to another. Our solution opens avenues that enable end-user device to negotiate with the network controllers when taking its control plane decisions.

Software defined Network’s Garbage Collection with Clean-Up Packets

MTI ul Huque, G. Jourjon, C. Russell, and V. Gramoli
IEEE Transactions on Network and Service Management
Publication year: 2019.11
Abstract—Rule updates, such as policy or routing changes, occur frequently and instantly in software-defined networks managed by the controller. In particular, the controller software can modify the network routes by introducing new forwarding rules and deleting old ones in a distributed set of switches, a challenge that has received lots of attention in the last few years. In this paper, we present a problem that consists of determining the appropriate point in the rule update where it is safe to garbage collect old rules. To illustrate the difficulty
of the problem, we list the previously proposed assumptions, like the upper-bound on the transmission delay of every packet through the network, and we offer a solution that alleviates these assumptions and significantly reduces the rule update time with a guarantee that no data packet is lost due to the rule alteration through the use of dedicated clean-up packets that detect the absence of in-flight packets. We then prove that the proposed technique guarantees per-packet consistency, blackhole-freedom, and loop-freedom. Our evaluations, via network emulations and real deployment in an SDN testbed, demonstrate that by using the proposed garbage collection solution the rule update times
of the two phase rule update can be reduced by up to 99%.

Fast Privacy-Preserving Network Function Outsourcing

Hassan Jameel Asghar, Emiliano De Cristofaro, Guillaume Jourjon, Dali Kaafar, Laurent Mathy, Luca Melis, Craig Russell, Mang Yu
Computer Networks
Publication year: 2019.09

Abstract: In this paper, we present the design and implementation of SplitBox, a system for privacy-preserving processing of network functions outsourced to cloud middleboxes—i.e., without revealing the policies governing these functions. SplitBox is built to provide privacy for a generic network function that abstracts the functionality of a variety of network functions and associated policies, including firewalls, virtual LANs, network address translators (NATs), deep packet inspection, and load balancers. We present a scalable design aiming to provide high throughput and low latency, by distributing functionalities to a few virtual machines (VMs), while providing provably secure guarantees. We implement SplitBox inside FastClick, an extension of the Click modular router, using Intel’s DPDK to handle packet I/O. We evaluate our prototype experimentally to find its bottlenecks and stress-test its different components, vis-\`a-vis two widely used network functions, i.e., firewall and VLAN tagging. Our evaluation shows that, on commodity hardware, SplitBox can process packets close to line rate (i.e., 8.9Gbps) with up to 50 traversed policies.

A Delay-Tolerant Payment Scheme Based on the Ethereum Blockchain

Yining Hu, Ahsan Manzoor, Parinya Ekparinya, Madhusanak Liyanage, Kanchana Thilakarathna, Guillaume Jourjon, and Aruna Seneviratne
IEEE Access, 2019
Publication year: 2019.04

Digital banking as an essential service can be hard to access in remote, rural regions where the network connectivity is unavailable or intermittent. Payment operators like Visa and Mastercard often face difficulties reaching these remote, rural areas. Although micro-banking has been made possible by Short Message Service (SMS) or Unstructured Supplementary Service Data (USSD) messages in some places, their security flaws and session-based nature prevent them from a wider adoption. Global-level cryptocurrencies enable low-cost, secure and pervasive money transferring among distributed peers, but are still limited in their ability to reach people in remote communities.

We propose a blockchain-based digital payment scheme that can deliver reliable services on top of unreliable networks in remote regions. We focus on a scenario where a community-run base station provides reliable local network connectivity while intermittently connects to the broader Internet. We take advantage of the distributed verification guarantees of Blockchain technology for financial transaction verification and leverage smart contracts for secure service management. In the proposed system, payment operators deploy multiple proxy nodes that are intermittently connected to remote communities where the local Ethereum blockchain networks are composed of miners, vendors and regular users. Through probabilistic modelling, we devise design parameters for the blockchain network to realise robust operation over the top of unreliable network. Furthermore, we show that transaction processing time will not be significantly impacted due to network unreliability through extensive emulations on a private Ethereum network. Finally, we demonstrate the practical feasibility of the proposed system by developing NFC (Near Field Communication) enabled payment gateways on Raspberry-Pis, a mobile wallet application and mining nodes on off-the-shelf computers.

Measuring, Characterizing, and Detecting Facebook Like Farms

Muhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi, Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohammed Ali Kaafar, and M. Zubair Shafiq
ACM Transactions on Privacy and Security, Volume 20, Issue 4, September 2017, pp. 13:1--13:28
Publication year: 2017.09

Online Social Networks (OSNs) offer convenient ways to cheaply reach out to potentially large audiences. In particular, Facebook pages are increasingly used by businesses, brands, and organizations to connect with millions of users worldwide. As the number of likes of a page has become a de-facto measure of its popularity and profitability, alongside Facebook’s official targeted advertising platform, an underground market of services artificially inflating page likes, aka {\em like farms}, has also emerged. However, besides a few alarming media reports, there is very little work that systematically analyzes Facebook pages’ promotion methods. To fill this gap, this paper presents a honeypot-based comparative measurement study of page likes garnered via Facebook advertising and by four popular like farms. First, we analyze likes based on demographic, temporal, and social characteristics, and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users’ behavior. Next, we look at fraud detection algorithms currently deployed by Facebook and show that they do not work well to detect stealthy farms which spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main categories: lexical and non-lexical. We find that like farm accounts tend to often re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. Using relevant lexical and non-lexical features, we built a classifier to detect like farms accounts that achieves a precision higher than 99\% and a 93\% recall.

Garbage Collection of Forwarding Rules in Software Defined Networks

MTI ul Huque, G. Jourjon, V. Gramoli
IEEE Communications Magazine, June 2017
Publication year: 2017.06

Abstract: Software defined networking (SDN) brought new interesting challenges by externalizing the task of controlling the network to some generic computer software. In particular, the controller software can modify the network routes by introducing new forwarding rules and deleting old ones at a distributed set of switches, a challenge that has received lots of attention in the last six years.

In this paper, we survey the different techniques to update rules, sometimes relying on redundant paths to reroute the traffic during the update, sometimes activating rules at distinct switches in a specific order, to avoid looping packets. This state-of-theart helps understanding another, often overlooked, problem that consists of determining the appropriate point in the update where it is safe to garbage collect old rules.

To illustrate the difficulty of the problem we list the previously proposed assumptions, like the upper-bound on the transmission delay of every packet through the network. Finally, we propose a solution that alleviates these assumptions and speeds up the original two-phase rule update by about 80% through the use of dedicated clean-up packets that detect the absence of in-flight packets.

Large-Scale Dynamic Controller Placement.

MTI ul Huque, W. Si, G. Jourjon, V. Gramoli.
IEEE Transactions on Network and Service Management, Volume 14, Issue 1, March 2017
Publication year: 2017.03

Abstract: The controller placement problem (CPP) is one of the key challenges of software defined networks (SDN) to increase performance. Given the locations of n switches, CPP consists of choosing the controller locations that minimize the latency between switches and SDN controllers. In its current form, however, CPP assumes a fixed traffic and no existing solutions adapt the placement to the load. In this paper, we have addressed the dynamic controller placement problem that consists of (i) determining the locations of controller modules to bound communication latencies, and of (ii) determining the number of controllers per module to support the dynamic load. We propose an algorithm named LiDy+ that runs in O(n2) and combines a controller module placement algorithm with a dynamic flow management algorithm.

We evaluate the number of controllers, the controller utilization, the power consumption and the maintenance cost of LiDy+ on both sparse and dense networks. Our comparison against a previous solution shows that LiDy+ does not only achieve a smaller number of controllers and a higher controller utilization, but also incurs less energy and maintenance costs than the previous solution. Finally, we run LiDy+ in a large-scale environment where the previous solution of time complexity (n2 log n) is impractical.

FORGE Toolkit: Leveraging Distributed Systems in eLearning Platforms

Guillaume Jourjon, Johann M Marquez-Barja, Thierry Rakotoarivelo, Alexander Mikroyannidis, Kostas Lampropoulos, Spyros Denazis, Christos Tranoris, Daan Pareit, John Domingue, Luiz A DaSilva, Max Ott
IEEE Transactions on Emerging Topics in Computing, Vol. 5 (1), pp: 7 - 19
Publication year: 2017.01

Abstract: While more and more services become virtualized and always accessible in our society, laboratories supporting computer science (CS) lectures have mainly remained offline and class-based. This apparent abnormality is due to several limiting factors, discussed in the literature, such as the high cost of deploying and maintaining computer network testbeds and the lack of standardization for the presentation of eLearning platforms. In this paper, we present the FORGE toolkit, which leverages experimentation facilities currently deployed in international initiatives for the development of e-learning materials. Thus, we solve the institutional challenge mentioned in the ACM/IEEE 2013 CS curricula concerning the access and maintenance of specialized and heterogeneous hardware thanks to a seamless integration with the networking test-bed community. Moreover, this project builds an ecosystem, where teaching and educational materials, tools, and experiments are available under open scheme and policies. We demonstrate how it already meets most of the requirements from the network and communication component of CS 2013 and some of the labs of the Cisco academy. Finally, we present experience reports illustrating the potential benefits of this framework based on the first deployments in four post-graduate courses in prestigious institutions around the world.

Designing and Orchestrating Reproducible Experiments on Federated Networking Testbeds

Thierry Rakotoarivelo, Guillaume Jourjon and Max Ott
Elsevier Computer Networks, Special issue on Future Internet Testbeds, pp. 173-187
Publication year: 2014.04

Abstract: In addition to theoretical analysis and simulations, the evaluation of new networking technologies in a real-life context and scale is critical to their global adoption and deployment. Federations of experimental platforms (aka testbeds) offer a controlled and cost-effective solution to perform such an evaluation. Most recent efforts in that area focused on building those facilities and providing experimenters with tools to allow the discovery and provisioning of their shared resources. Many challenges remain in order to support the complete experiment life cycle in a federated environment.

We propose OMF-F, a framework which allows the definition of networking experiments and their execution over shared resources provided by different federated administrative domains. OMF-F provides a domain-specific language enabling rich event-based experiment descriptions. It defines a specific resource model and protocol, which together with its publish-subscribe messaging system allows automatic experiment orchestrations at a large scale. OMF-F further provides interfaces to operate with existing resource discovery and provisioning tools for federated testbeds.

Our contributions in this paper are threefold. First we provide detailed descriptions of OMF-F’s design, its architecture, and its involved entities. Then, we present a quantitative evaluation of its underlying messaging and event-handling systems. Finally, we discuss two real examples of OMF-F deployed and used on federated domains to define and execute experiments.

An Instrumentation Framework for the Critical Task of Measurement Collection in the Future Internet

Olivier Mehani, Guillaume Jourjon, Thierry Rakotoarivelo and Max Ott
Elsevier Computer Networks, Special issue on Future Internet Testbeds, pp. 68-83
Publication year: 2014.04

Abstract: Experimental research on future Internet technologies involves observing multiple metrics at various distributed points of the networks under study. Collecting these measurements is often a tedious, repetitive and error prone task, be it in a testbed or in an uncontrolled field experiment. The relevant experimental data is usually scattered across multiple hosts in potentially different formats, and sometimes buried amongst a trove of other measurements, irrelevant to the current study. Collecting, selecting and formatting the useful measurements is a time-consuming and error-prone manual operation.

In this paper, we present a conceptual Software-Defined Measurement (SDM) framework to facilitate this task. It includes a common representation for any type of experimental data, as well as the elements to process and collect the measurement samples and their associated metadata. We then present an implementation of this concept, which we built as a major extension and refactoring of the existing Orbit Measurement Library (OML). We outline its API, and how it can be used to instrument an experiment in only a few lines of code. We also evaluate the current implementation, and demonstrate that it efficiently allows measurement collection without interfering with the systems under observation.

Promoting the Use of Reliable Rate Based Transport Protocols: The Chameleon Protocol

Emmanuel Lochin, Guillaume Jourjon, Sebastien Ardon and Patrick Senac
International Journal of Internet Protocol Technology, Vol. 5, No. 4, pp.175-189
Publication year: 2010.03

Abstract: Rate-based congestion control, such as TFRC, has not been designed to enable reliability. Indeed, the birth of TFRC protocol has resulted from the need for a congestion-controlled transport protocol in order to carry multimedia traffic. However, certain applications still prefer the use of UDP in order to implement their own congestion control on top of it. The present contribution proposes to design and validate a reliable rate-based protocol based on the combined use of TFRC, SACK and an adapted flow control. We argue that rate-based congestion control is a perfect alternative to window-based congestion control as most of today applications need to interact with the transport layer and should not be only limited to unreliable services. In this paper, we detail the implementation of a reliable rate-based protocol named Chameleon and bring out to the networking community an ns-2 implementation for evaluation purpose.

OMF: A control and management framework for networking nestbeds

Thierry Rakotoarivelo, Max Ott, Guillaume Jourjon, and Ivan Seskar
SIGOPS Operating Systems Review, 43(4):54–59
Publication year: 2010.01

Abstract: Networking testbeds are playing an increasingly important role in the development of new communication technologies. Testbeds are traditionally built for a particular project or to study a specific technology. An alternative approach is to federate existing testbeds to a) cater for experimenter needs which cannot be fullled by a single testbed, and b) provide a wider variety of environmental settings at different scales. These heterogenous settings allow the study of new approaches in environments similar to what one finds in the real world.

This paper presents OMF, a control, measurement, and management framework for testbeds. It describes through some examples the versatility of OMF’s current architecture and gives directions for federation of testbeds through OMF. In addition, this paper introduces a comprehensive experiment description language that allows an experimenter to describe resource requirements and their configurations, as well as experiment orchestration. Researchers would thus be able to reproduce their experiment on the same testbed or in a different environment with little changes. Along with the efficient support for large scale experiments, the use of testbeds and support for repeatable experiments will allow the networking field to build a culture of cross verification and therefore strengthen its scientific approach.

Towards sender-based TFRC

Guillaume Jourjon, Emmanuel Lochin and Patrick Senac
Journal of Internet Engineering pp: 193-201, Vol 3, No 1
Publication year: 2009.01

Abstract: Pervasive communications are increasingly sent over mobile devices and personal digital assistants. This trend is currently observed by mobile phone service providers which have measured a significant increase in multimedia traffic. To better carry multimedia traffic, the IETF standardized a new TCP Friendly Rate Control (TFRC) protocol. However, the current receiver-based TFRC design is not well suited to resource limited end systems. In this paper, we propose a scheme to shift resource allocation and computation to the sender. This senderbased approach led us to develop a new algorithm for loss notification and loss-rate computation. We detail the complete implementation of a user-level prototype and demonstrate the gain obtained in terms of memory requirements and CPU processing compared to the current design. We also evaluate the performance obtained in terms of throughtput smoothness and fairness with TCP and we note this shifting solves security issues raised by classical TFRC implementations.

Design, Implementation and Evaluation of a QoS-aware Transport Protocol

Guillaume Jourjon, Emmanuel Lochin and Patrick Senac
Elsevier Computer Communications, volume 31, issue 9, pp 1713-1722
Publication year: 2008.06

Abstract: In the context of a reconfigurable transport protocol framework, we propose a QoS-aware Transport Protocol (QSTP), specifically designed to operate over QoS-enabled networks with bandwidth guarantee. QSTP combines QoS-aware TFRC congestion control mechanism, which takes into account the network-level bandwidth reservations, with a Selective ACKnowledgment (SACK) mechanism in order to provide a QoS-aware transport service that fill the gap between QoS enabled network services and QoS constraint applications. We have developed a prototype of this protocol in the user-space and conducted a large range of measurements to evaluate this proposal under various network conditions. Our results show that QSTP allows applications to reach their negotiated QoS over bandwidth guaranteed networks, such as DiffServ/AF network, where TCP fails. This protocol appears to be the first reliable protocol especially designed for QoS network architectures with bandwidth guarantee.

IREEL: Remote Experimentation with Real Protocols and Applications over Emulated Network

Laurent Dairaine, Guillaume Jourjon, Emmanuel Lochin and Sebastien Ardon
Inroads, the SIGCSE Bulletin, Volume 39, Issue 2, June 2007
Publication year: 2007.04

Abstract: This paper presents a novel e-learning platform called IREEL. IREEL is a virtual laboratory allowing students to drive experiments with real Internet applications and end-to-end protocols in the context of networking courses. This platform consists in a remote network emulator offering a set of predefined applications and protocol mechanisms. Experimenters configure and control the emulation and the end-systems behavior in order to perform tests, measurements and observations on protocols or applications operating under controlled specific networking conditions. A set of end-to-end mechanisms, mainly focusing on transport and application level protocols, are currently available. IREEL is scalable and easy to use thanks to an ergonomic web interface.

Optimization of Loss History Initialization

Guillaume Jourjon, Emmanuel Lochin and Laurent Dairaine
EEE Communications Letters, Volume 11, Number 3, March 2007, pp 276-278
Publication year: 2007.03

Abstract: This letter deals with the initialization of the loss history structure in the TFRC (TCP-friendly rate control) mechanism. This initialization occurs after the detection of the first loss event after every slowstart phase. The loss history is crucial for the algorithm since it returns the packet loss rate estimation. This estimation is used in the TFRC equation to compute the sending rate. In this letter, we propose a new method to compute the packet loss rate which is more computationally efficient and remains as accurate as the classical commonly used method. The motivation of this work is to reduce the computation time and formulate a unified computation scheme. This method is based on the Newton’s algorithm issued from numerical analysis of the TCP throughput equation. This proposal is evaluated analytically and the results show a significant improvement in terms of the computation time.

gTFRC, a TCP Friendly QoS-aware Rate Control for Diffserv Assured Service

Emmanuel Lochin, Laurent Dairaine, Guillaume Jourjon
Springer Telecommunication Systems Journal, 10.1007/s11235-006-9004-2, ISSN : 1018-4864 (Print) 1572-9451 (Online), Volume 33, Numbers 1-3 / December, 2006, pp 3-21
Publication year: 2006.12

Abstract: This study addresses the end-to-end congestion control support over the DiffServ Assured Forwarding (AF) class. The resulting Assured Service (AS) provides a minimum level of throughput guarantee. In this context, this article describes a new end-to-end mechanism for continuous transfer based on TCP-Friendly Rate Control (TFRC). The proposed approach modifies TFRC to take into account the QoS negotiated. This mechanism, named gTFRC, is able to reach the minimum throughput guarantee whatever the flow’s RTT and target rate. Simulation measurements and implementation over a real QoS testbed demonstrate the efficiency of this mechanism either in over-provisioned or exactly-provisioned network. In addition, we show that the >frc mechanism can be used in the same DiffServ/AF class with TCP or TFRC flows.