Thursday, February 6, 2020

Down the Rabbit Hole of Harvested Personal Data

In this blog post I will shine a light at dubious business practices around trading of personal data. I  describe by which technical means personal data is harvested at first, and how it is sold via intermediaries later on. Based on a recent experience of leaked personal data, I will track down my own mobile phone number

The story began when I received a contact request from a recruiter via Whatsapp messenger. I was all but happy about this aggressive tactic, to say the least. Therefore I decided to contact that Swiss based recruiting company ( and asked them to inform me about their data source. Their CEO replied within hours and let me know that my number was acquired «from a publicly available directory called». – the data scrapers is based in Tel-Aviv and was founded in 2016. Quoting their website, they «collect information concerning business profiles, including […] name, company name, job title, email address, phone numbers, business address, and social media links», which of course are sold for access. Unfortunately, only vague information is provided on their data sources: «Lusha collects information from publicly available sources and from its business partners which take part in building and improving the Lusha community.» We will soon see that the term publicly available sources is open to interpretation and that the contribution of their business partners is key for their success. As a company aggressively aspiring to grow (I counted 7 sales/marketing employees versus 4 engineers), they are constrained by data protection laws such as GDPR and CCPA for the respective European and American markets. They offer a contact for European citizens to exercise their differents rights under the GDPR. I am not a lawyer, but the phrasing sometimes does sound as if they weigh their interests pretty much over the regulations, e.g. «[…] please note that these rights are not absolute, and may be subject to our own legitimate interests and regulatory requirements» and «Lusha’s lawful basis for processing is its legitimate interest in providing its services to its users» ¯\_(ツ)_/¯

Anyhow, let's get to the inner workings of A free subscription is available with an offer of 5 credits per month, each credit allowing to retrieve data of one requested person. As a user, you are supposed to use a browser extension to integrate with LinkedIn when accessing user profiles on the platform. The extension is available for Chrome, Edge and Firefox; note that the extension is not recommended by Firefox.

For the dynamic analysis, I installed the Firefox extension directly from the Add-ons website. For a static analysis, there are two options: download the .xpi file (right click on the blue "+ Add to Firefox" button and select "Save as...") or retrieve it from the user profiles folder (on a mac: /Users/<username>/Library/ApplicationSupport/Firefox/Profiles/<profile>/extensions/<filename>.xpi).

File name: lusha_easily_find_b2b_contact_details-9.5.1-an+fx.xpi
SHA-256: 11e48be153a28adda35514e959844098f41d5606628c454efbcb7a5c683acab5

The .xpi file is just a ZIP archive containing the different HTML and Javascript artefacts used by the extension. The file manifest.json contains the permissions and URLs for content scripts. This is the code that will be injected in the pages when visiting the corresponding URLs, in this case LinkedIn and Salesforce.

  "manifest_version": 2,
  "short_name": "Lusha",
  "author": "Lusha",
  "description": "Lusha is the easiest way to find B2B contact information with just one click.",
  "version": "9.5.1",
  "name": "Lusha - Easily find B2B contact information",
  "content_scripts": [
      "matches": [
      "exclude_matches": [
      "js": [
      "run_at": "document_idle"
  "permissions": [
  "optional_permissions": [

Being logged on to LinkedIn I visited my own profile to see if and what data would provide:

A HTTP network trace using Burpsuite shows interesting behaviour as the entire HTML body is sent to's backend servers, as LZ-compressed, base64-encoded payload in the "html" value with a total of 18 kB (HTTP headers and payload truncated for better readability):

POST /v2/search HTTP/1.1


By decompressing the payload, we see that the contents include data really only visible by logged on users:

$ node
Welcome to Node.js v13.7.0.
Type ".help" for more information.
> var LZUTF8 = require('lzutf8');
> var compressed = fs.readFileSync('html_payload_compressed.bin', 'utf8')
> LZUTF8.decompress(compressed, {inputEncoding: "Base64"});
' <div class="artdeco-hoverable-content__content artdeco-hovercard-content-container">\n' +
' <p>See and edit how you look to people who are not signed in, and find you through search engines (ex: Google, Bing).</p>\n' +
'\n' +

In a second request, we do get access to the "enriched" data corresponding to the displayed LinkedIn profile:

POST /v2/show HTTP/1.1


HTTP/1.1 201 Created
Date: Tue, 04 Feb 2020 20:59:39 GMT
Content-Type: application/json; charset=utf-8
Connection: close

{"request":{"phones":["+41 7X XXX XX XX","+41 5X XXX XX
lists":"all contacts"},"company":{"address":"Worblaufen, Bern,
Technology","Telecommunications"],"description":"Swisscom, Switzerland’s
leading telecoms company and one of its leading IT companies, is headquartered

The browser extension states the following regarding the type of data submitted to their backend and required to identify a user profile:

From my observations, not only «certain words (such as full name and company name)», rather entire user profile data is sent to's servers. Also the data is not only sent when needed, i.e. when a user requests enriched data of a single, chosen LinkedIn profile, but for each and every visited profile. So this extension implements essentially a crawler that scrapes every single LinkedIn profile in private-view as the users' are browsing LinkedIn, which is actually a clear violation of LinkedIn's terms. So they are basically selling the data to the customers that are harvesting the data in the first place, brilliant!

Meanwhile, I contacted for a GDPR request, to which they replied:
In order to process your request we need to identify your profile,
for that purpose the following information is needed:
First Name:
Last Name:
Company Name:
Public LinkedIn profile link:
Needless to say that this procedure is insufficient to properly identify a legitimate requestor. I am tempted to try and impersonate another person, but I will abstain from doing so. As a result of my request, provided me with both my phone numbers, as already provided by my own lookup. They also mentioned the data origin: «The information above originates from a database maintained by Simpler Apps Inc ("Simpler").»

Interesting! From its description, Simpler is an app that replaces the standard dialing and contacts functions on Android smartphones. And unsurprisingly, the app requires access to the users' contacts as well as full network access. Now, what could've happened? I assume that my mobile phone numbers were harvested from an installation of this app from someone that must had me in his contacts. Perhaps I will try to investigate the Simpler app in a subsequent blog post...

edit - 20.02.2020: Nightwatch Cybersecurity wrote a follow-up blog post on the matter analysing the role of the Simpler app.

Thursday, January 23, 2020

Analysis of a Fake Threema App

A couple of days ago there were reports of an app on the Google Playstore, which seemed to impersonate the Threema messenger app. Threema is a Swiss secure messaging service that uses end-to-end encryption to provide privacy to their users.

In the past, several fake apps were already observed targeting Swiss brands, like e.g. Bluewin. In that case, the app's purpose was to steal user credentials (login/password) from users that inadvertently downloaded it from the wrong developer. A more detailed description on the modus operandi can be found in a blog post by SWITCH-CERT.

Unfortunately, I failed to take a screenshot of the app while it was still available on the Playstore and before it was taken down by Google. But I remember that the counter had already reached 100+ downloads. Currently the app can still be downloaded from alternative sites like e.g., which mirror all available apps from the Google Playstore. Each app in the Google Playstore is identified by a string in the form of a reverse domain name, in this case: com.wa.threema.

From the app description, we can see that the app was first published on January 9th 2020, meaning the app was available for more than ten days before it was reported to the Google abuse team and eventually removed.

So I went ahead and downloaded the APK file for analysis. First, I launched the emulator provided with the Android Studio development environment, dragged the APK into the virtual device and launched it. Meanwhile, I also started Burp Suite and changed the proxy settings of the emulator in order to intercept the network traffic. Unfortunately, this didn't work as expected because most network communication was destined to Google domains, which are protected by certificate pinning in the app. Therefore, I didn't follow up on the dynamic analysis, although it did allow me to take a couple of screenshots and to better understand the application logic:

I then used the JADX decompiler to open the APK file and recover its source code and other resources. First step is to analyse the AndroidManifest.xml, which contains a listing of relevant activities, especially the one that's called after the app startup: ar.codeslu.plax.MainActivity.

Looking at the code, we can see that the app makes use of Google's Firebase services, especially its noSQL database component, and we can already see what kind of entities are persisted on the backend: Global.USERS, Global.CHATS, Global.GROUPS and Global.CALLS. Also, an encryption object is created, it is initialized with Global.keyE and Global.salt, which are actually hardcoded values found in the class (funny but irrelevant for the rest of the analysis):

A glimpse at the string resources gives us information about the URLs used to connect to the Firebase database backend:

Thanks to Elliot Alderson's blog post on hacking the Donald Daters App, I learned how to access the insecure Firebase backend associated to the app, which of course contained all user, chat, group and call records, as defined in the MainActivity class. At the time of writing, the database contained 286 registered users, 15 chats and 8 calls.

Looking at the code, we can see that the app actually implements all functionality of a working messaging service, including audio and video calls. That's quite a lot of effort, assuming the app's intention is only phishing. Indeed, my assumption was that the app was attacking Threema's registration process, but I couldn't find evidence to back this claim. So what is this app intended for?

Based on the package name ar.codeslu.plax I figured that a similar app was being sold on a marketplace. And by that I mean you can actually buy the source code of the app for as less as 35 USD and customize it to offer your own chatting app on the Google Playstore:

It occurs you can even find free downloads of the code by googling somewhat:

There's also a more expensive license, that allows the buyer to charge its users and I assume that's the actual business model of the fake app:

By looking for other apps by the same developer (junemoney, we see almost a dozen other chatting apps that have all been released approx. the same time and that also impersonate other popular messaging services like Discord, TextNow or Zalo, for which he has even written a corresponding privacy policy (I guess that's mandatory if you want to publish apps on the Playstore).

So in conclusion, from my point of view, the fake app's intention is not to steal user credentials, rather trick people into downloading the wrong app and have them pay subscriptions for usage of the app. (Other ideas? leave me a comment)

Anyhow, such apps often slip through Google Playstore's "quality assurance" during publication and are then made available to everyone for download :-/ But since such apps clearly violate Google's Developer Policies, anyone can report them as being abusive. Either because they are malicious, as in the case of the phishing app, or either because they infringe the intellectual property rights of others. In which case being logged into your Google account, you can go to the app's Playstore page, scroll down and report the app based on one of the two described violations.

Indicators of Compromise:
Filename: Threema Private
SHA-256: a5422bc7f09c22a877f580119027ed83c6ba7ac12ae6647808b2ffddfcab7124