Thursday, February 6, 2020

Down the Rabbit Hole of Harvested Personal Data

In this blog post I will shine a light at dubious business practices around trading of personal data. I  describe by which technical means personal data is harvested at first, and how it is sold via intermediaries later on. Based on a recent experience of leaked personal data, I will track down my own mobile phone number

The story began when I received a contact request from a recruiter via Whatsapp messenger. I was all but happy about this aggressive tactic, to say the least. Therefore I decided to contact that Swiss based recruiting company (digitalent.ch) and asked them to inform me about their data source. Their CEO replied within hours and let me know that my number was acquired «from a publicly available directory called lusha.co».

lusha.co – the data scrapers

lusha.co is based in Tel-Aviv and was founded in 2016. Quoting their website, they «collect information concerning business profiles, including […] name, company name, job title, email address, phone numbers, business address, and social media links», which of course are sold for access. Unfortunately, only vague information is provided on their data sources: «Lusha collects information from publicly available sources and from its business partners which take part in building and improving the Lusha community.» We will soon see that the term publicly available sources is open to interpretation and that the contribution of their business partners is key for their success. As a company aggressively aspiring to grow (I counted 7 sales/marketing employees versus 4 engineers), they are constrained by data protection laws such as GDPR and CCPA for the respective European and American markets. They offer a contact for European citizens to exercise their differents rights under the GDPR. I am not a lawyer, but the phrasing sometimes does sound as if they weigh their interests pretty much over the regulations, e.g. «[…] please note that these rights are not absolute, and may be subject to our own legitimate interests and regulatory requirements» and «Lusha’s lawful basis for processing is its legitimate interest in providing its services to its users» ¯\_(ツ)_/¯

Anyhow, let's get to the inner workings of lusha.co. A free subscription is available with an offer of 5 credits per month, each credit allowing to retrieve data of one requested person. As a user, you are supposed to use a browser extension to integrate with LinkedIn when accessing user profiles on the platform. The extension is available for Chrome, Edge and Firefox; note that the extension is not recommended by Firefox.


For the dynamic analysis, I installed the Firefox extension directly from the Add-ons website. For a static analysis, there are two options: download the .xpi file (right click on the blue "+ Add to Firefox" button and select "Save as...") or retrieve it from the user profiles folder (on a mac: /Users/<username>/Library/ApplicationSupport/Firefox/Profiles/<profile>/extensions/<filename>.xpi).

File name: lusha_easily_find_b2b_contact_details-9.5.1-an+fx.xpi
SHA-256: 11e48be153a28adda35514e959844098f41d5606628c454efbcb7a5c683acab5

The .xpi file is just a ZIP archive containing the different HTML and Javascript artefacts used by the extension. The file manifest.json contains the permissions and URLs for content scripts. This is the code that will be injected in the pages when visiting the corresponding URLs, in this case LinkedIn and Salesforce.

{
  "manifest_version": 2,
  "short_name": "Lusha",
  "author": "Lusha",
  "description": "Lusha is the easiest way to find B2B contact information with just one click.",
  "version": "9.5.1",
  "name": "Lusha - Easily find B2B contact information",
  "content_scripts": [
    {
      "matches": [
        "https://dashboard.lusha.co/*",
        "https://*.linkedin.com/*",
        "https://*.salesforce.com/*"
      ],
      "exclude_matches": [
        "https://www.lusha.co/*",
        "https://www.salesforce.com/*",
        "https://*.lightning.force.com/*"
      ],
      "js": [
        "content.js",
        "assets.js"
      ],
      "run_at": "document_idle"
    }
  ],
  "permissions": [
    "tabs",
    "https://*.lusha.co/*",
    "storage"
  ],
  "optional_permissions": [
    "https://*.lightning.force.com/*"
  ]
...
  }
}

Being logged on to LinkedIn I visited my own profile to see if and what data lusha.co would provide:


A HTTP network trace using Burpsuite shows interesting behaviour as the entire HTML body is sent to lusha.co's backend servers, as LZ-compressed, base64-encoded payload in the "html" value with a total of 18 kB (HTTP headers and payload truncated for better readability):


POST /v2/search HTTP/1.1
Host: plugin-services.lusha.co


{"html":"PGRpdiBjbGFzcz0iZ2xvYmFsLWFsZXJ0IMwNLS15aWVsZM8UaXNFeHBhbmRlZCIgZGF0YS1
pZD0iY29va2llLXBvbGljeSI+PGxpLWljb27UY19fxRrSE8hvb25sb2FkIGxhenktxAplZCI+PC/HUj4
8cNZVbWVzc2FnZS1jb250ZW50Ij5UaGlzIHdlYnNpdGUgdXNlcyDmAKVzIHRvIGltcHJvdmUgc2Vydml
jZSBhbmQgxBJpZGUgdGFpbG9yZWQgYWRzLiBCeSB1c2luZyB0xFDETSwgeW91IGFncmVlxEvFGHVzZS4
gU2VlIG91ciA8YSBocmVmPSJodHRwczovL3d3dy5saW5rZWRpbi5jb20vbGVnYWwv7QE1P3Ryaz1wdWJ
saWNfcHJvZmlsZV/GIV/GIV9jbGlja+cBdHRyYWNraW5n5QDvcm9sLW5hbWU9It9A00B3aWzEPXZpZ2F
0ZT0iIj5DxTQgUMU0PC9hPi48L3A+PGJ1dHRvbiB0eXBlPSLGDSL2AYnIHf8AqOwAqGRpc21pc3MiIGF
yaWEtbGFiZWw9IkTHFf8CTcQaLS3HQvQCPMZVaGlkZGVuPSJ0cnVl7QJPL+YAsD48L2Rpdj48aGVhZGV
yyGvGDiI+PG5h6QMvbmF2Ij7pAeMv9AG4bmF2LcY7LWxvZ2/pASluYXZfX8QSLeQCDv8BI99T/wHHPHN
wYekBMsp4dGV4dCI+TOUCjEluPC/EJvEBYckv/wFI7gFIYT48c2VjdGnKSHNlYXJjaC1iYXLnAK9jdXJ
yZW50LccZ5gJNUEVPUExFIukCY9E9X19wbGFjZWhvbOQBmnNob3ctb24tbW9iaWxlIGhpZGXED2Rlc2t
0b3D/AVTuAVTHY3N3aXRjaGVyLW9wZW5lcu4ChuYE7CDGKyI+7AU47ACiZnVsbC3rAKciPkFudG9pbmU
gTmV1ZW5zY2h35AUzcucCZukCdcxP6ACHdGFic19fdHJpZ2dlci1hbmTFEvIBIc4wy33pARXmAST/ART
vARTMVu4A0u0CU9p+5wJkUGVvcGxl9wJizz5jYXJldC1kb3duLWZpbGxl/wYE/wFAxFTwAhjwARM+PGg
...
A6BCWzxbpCs9NRkFIU2xTVllETmZ0akRFanlNZmFLVnV6YmpZQ3hITOgA5M9XzQ1B5AcPb3Bl5gsq7wC
95Q2e7gNm6Qe1xQY=","url":"https://ch.linkedin.com/in/antoineneuenschwander","req
uest_id":"74924a15-ef0c-4bed-97a9-498831865d47","firstSearch":true}


By decompressing the payload, we see that the contents include data really only visible by logged on users:


$ node
Welcome to Node.js v13.7.0.
Type ".help" for more information.
> var LZUTF8 = require('lzutf8');
undefined
> var compressed = fs.readFileSync('html_payload_compressed.bin', 'utf8')
undefined
> LZUTF8.decompress(compressed, {inputEncoding: "Base64"});
...
' <div class="artdeco-hoverable-content__content artdeco-hovercard-content-container">\n' +
' <p>See and edit how you look to people who are not signed in, and find you through search engines (ex: Google, Bing).</p>\n' +
'\n' +
...


In a second request, we do get access to the "enriched" data corresponding to the displayed LinkedIn profile:


POST /v2/show HTTP/1.1
Host: plugin-services.lusha.co

{"request_id":"74924a15-ef0c-4bed-97a9-498831865d47"}


HTTP/1.1 201 Created
Date: Tue, 04 Feb 2020 20:59:39 GMT
Content-Type: application/json; charset=utf-8
Connection: close

{"request":{"phones":["+41 7X XXX XX XX","+41 5X XXX XX
XX"],"emails":[],"name":"Antoine
Neuenschwander","link":"https://ch.linkedin.com/in/antoineneuenschwander","platf
orm":"linkedin","contact":{"id":"42127f80-4791-11ea-a6c7-4fac905d8f3e","firstNam
e":"Antoine","lastName":"Neuenschwander","showDate":"2020-02-04T20:59:39.256Z","
lists":"all contacts"},"company":{"address":"Worblaufen, Bern,
Switzerland","categories":["Information
Technology","Telecommunications"],"description":"Swisscom, Switzerland’s
leading telecoms company and one of its leading IT companies, is headquartered
in
Ittigen.","domain":"swisscom.ch","employees":"+10,000","founded":"1998","logo":"
https://logo.lusha.co/a/4f4/4f479b5d-343e-49c2-822c-edd920333dd0.jpg","name":"Sw
isscom","social":{"facebook":{"url":"https://www.facebook.com/Swisscom"},"linked
in":{"url":"https://www.linkedin.com/company-beta/2715"},"twitter":{"url":"https
://twitter.com/swisscom_de"}},"website":"swisscom.ch"}},"user":{"email":"XXXXX",
"isOnBoarding":true},"trialExpired":false}


The lusha.co browser extension states the following regarding the type of data submitted to their backend and required to identify a user profile:


From my observations, not only «certain words (such as full name and company name)», rather entire user profile data is sent to lusha.co's servers. Also the data is not only sent when needed, i.e. when a user requests enriched data of a single, chosen LinkedIn profile, but for each and every visited profile. So this extension implements essentially a crawler that scrapes every single LinkedIn profile in private-view as the users' are browsing LinkedIn, which is actually a clear violation of LinkedIn's terms. So they are basically selling the data to the customers that are harvesting the data in the first place, brilliant!


Meanwhile, I contacted lusha.co for a GDPR request, to which they replied:
In order to process your request we need to identify your profile,
for that purpose the following information is needed:
First Name:
Last Name:
Company Name:
Nationality:
Public LinkedIn profile link:
Needless to say that this procedure is insufficient to properly identify a legitimate requestor. I am tempted to try and impersonate another person, but I will abstain from doing so. As a result of my request, lusha.co provided me with both my phone numbers, as already provided by my own lookup. They also mentioned the data origin: «The information above originates from a database maintained by Simpler Apps Inc ("Simpler").»

Interesting! From its description, Simpler is an app that replaces the standard dialing and contacts functions on Android smartphones. And unsurprisingly, the app requires access to the users' contacts as well as full network access. Now, what could've happened? I assume that my mobile phone numbers were harvested from an installation of this app from someone that must had me in his contacts. Perhaps I will try to investigate the Simpler app in a subsequent blog post...


edit - 20.02.2020: Nightwatch Cybersecurity wrote a follow-up blog post on the matter analysing the role of the Simpler app.