Thursday, February 6, 2020

Down the Rabbit Hole of Harvested Personal Data

In this blog post I will shine a light at dubious business practices around trading of personal data. I  describe by which technical means personal data is harvested at first, and how it is sold via intermediaries later on. Based on a recent experience of leaked personal data, I will track down my own mobile phone number

The story began when I received a contact request from a recruiter via Whatsapp messenger. I was all but happy about this aggressive tactic, to say the least. Therefore I decided to contact that Swiss based recruiting company ( and asked them to inform me about their data source. Their CEO replied within hours and let me know that my number was acquired «from a publicly available directory called». – the data scrapers is based in Tel-Aviv and was founded in 2016. Quoting their website, they «collect information concerning business profiles, including […] name, company name, job title, email address, phone numbers, business address, and social media links», which of course are sold for access. Unfortunately, only vague information is provided on their data sources: «Lusha collects information from publicly available sources and from its business partners which take part in building and improving the Lusha community.» We will soon see that the term publicly available sources is open to interpretation and that the contribution of their business partners is key for their success. As a company aggressively aspiring to grow (I counted 7 sales/marketing employees versus 4 engineers), they are constrained by data protection laws such as GDPR and CCPA for the respective European and American markets. They offer a contact for European citizens to exercise their differents rights under the GDPR. I am not a lawyer, but the phrasing sometimes does sound as if they weigh their interests pretty much over the regulations, e.g. «[…] please note that these rights are not absolute, and may be subject to our own legitimate interests and regulatory requirements» and «Lusha’s lawful basis for processing is its legitimate interest in providing its services to its users» ¯\_(ツ)_/¯

Anyhow, let's get to the inner workings of A free subscription is available with an offer of 5 credits per month, each credit allowing to retrieve data of one requested person. As a user, you are supposed to use a browser extension to integrate with LinkedIn when accessing user profiles on the platform. The extension is available for Chrome, Edge and Firefox; note that the extension is not recommended by Firefox.

For the dynamic analysis, I installed the Firefox extension directly from the Add-ons website. For a static analysis, there are two options: download the .xpi file (right click on the blue "+ Add to Firefox" button and select "Save as...") or retrieve it from the user profiles folder (on a mac: /Users/<username>/Library/ApplicationSupport/Firefox/Profiles/<profile>/extensions/<filename>.xpi).

File name: lusha_easily_find_b2b_contact_details-9.5.1-an+fx.xpi
SHA-256: 11e48be153a28adda35514e959844098f41d5606628c454efbcb7a5c683acab5

The .xpi file is just a ZIP archive containing the different HTML and Javascript artefacts used by the extension. The file manifest.json contains the permissions and URLs for content scripts. This is the code that will be injected in the pages when visiting the corresponding URLs, in this case LinkedIn and Salesforce.

  "manifest_version": 2,
  "short_name": "Lusha",
  "author": "Lusha",
  "description": "Lusha is the easiest way to find B2B contact information with just one click.",
  "version": "9.5.1",
  "name": "Lusha - Easily find B2B contact information",
  "content_scripts": [
      "matches": [
      "exclude_matches": [
      "js": [
      "run_at": "document_idle"
  "permissions": [
  "optional_permissions": [

Being logged on to LinkedIn I visited my own profile to see if and what data would provide:

A HTTP network trace using Burpsuite shows interesting behaviour as the entire HTML body is sent to's backend servers, as LZ-compressed, base64-encoded payload in the "html" value with a total of 18 kB (HTTP headers and payload truncated for better readability):

POST /v2/search HTTP/1.1


By decompressing the payload, we see that the contents include data really only visible by logged on users:

$ node
Welcome to Node.js v13.7.0.
Type ".help" for more information.
> var LZUTF8 = require('lzutf8');
> var compressed = fs.readFileSync('html_payload_compressed.bin', 'utf8')
> LZUTF8.decompress(compressed, {inputEncoding: "Base64"});
' <div class="artdeco-hoverable-content__content artdeco-hovercard-content-container">\n' +
' <p>See and edit how you look to people who are not signed in, and find you through search engines (ex: Google, Bing).</p>\n' +
'\n' +

In a second request, we do get access to the "enriched" data corresponding to the displayed LinkedIn profile:

POST /v2/show HTTP/1.1


HTTP/1.1 201 Created
Date: Tue, 04 Feb 2020 20:59:39 GMT
Content-Type: application/json; charset=utf-8
Connection: close

{"request":{"phones":["+41 7X XXX XX XX","+41 5X XXX XX
lists":"all contacts"},"company":{"address":"Worblaufen, Bern,
Technology","Telecommunications"],"description":"Swisscom, Switzerland’s
leading telecoms company and one of its leading IT companies, is headquartered

The browser extension states the following regarding the type of data submitted to their backend and required to identify a user profile:

From my observations, not only «certain words (such as full name and company name)», rather entire user profile data is sent to's servers. Also the data is not only sent when needed, i.e. when a user requests enriched data of a single, chosen LinkedIn profile, but for each and every visited profile. So this extension implements essentially a crawler that scrapes every single LinkedIn profile in private-view as the users' are browsing LinkedIn, which is actually a clear violation of LinkedIn's terms. So they are basically selling the data to the customers that are harvesting the data in the first place, brilliant!

Meanwhile, I contacted for a GDPR request, to which they replied:
In order to process your request we need to identify your profile,
for that purpose the following information is needed:
First Name:
Last Name:
Company Name:
Public LinkedIn profile link:
Needless to say that this procedure is insufficient to properly identify a legitimate requestor. I am tempted to try and impersonate another person, but I will abstain from doing so. As a result of my request, provided me with both my phone numbers, as already provided by my own lookup. They also mentioned the data origin: «The information above originates from a database maintained by Simpler Apps Inc ("Simpler").»

Interesting! From its description, Simpler is an app that replaces the standard dialing and contacts functions on Android smartphones. And unsurprisingly, the app requires access to the users' contacts as well as full network access. Now, what could've happened? I assume that my mobile phone numbers were harvested from an installation of this app from someone that must had me in his contacts. Perhaps I will try to investigate the Simpler app in a subsequent blog post...

edit - 20.02.2020: Nightwatch Cybersecurity wrote a follow-up blog post on the matter analysing the role of the Simpler app.


  1. Thank you for publishing this.

  2. Would be interesting to know what Linkedin thinks about this.

  3. Your work confirms what the CNIL (the french data cop) has revealed ~2 years ago about lusha's actions:

    It is said that Linkedin swears lusha's data isn't coming from them.

  4. I definitely enjoying every little bit of it. It is a great website and nice share. I want to thank you. Good job! You guys do a great blog, and have some great contents. Keep up the good work. Scan