Getting Started
  • Protecto Overview
    • Introduction
      • Quickstart Guide
      • Protecto Vault
        • What is a token?
        • Token customization
        • Authentication
        • Tokenization APIs
          • Masking
            • Mask with token
            • Mask with format and token
            • Identify and mask (Auto-detect)
          • Unmasking
          • What happens if an API fails?
        • Asynchronous API's
        • Bulk data
      • Add new data source
        • Snowflake
          • Create and grant access to Protecto
          • Add Snowflake to Protecto
        • Salesforce
          • Create connected app and user
            • Steps to create connected app
            • Steps to create Protecto user
          • Add Salesforce to Protecto
        • Azure SQL
          • Connect using AD Application credentials
          • Connect using database user credentials
        • Databricks
          • Add Service principal (Azure AD Application) to Databricks
          • Steps to create Azure Databricks Cluster
          • Steps to create Databricks python notebook and schedule job
        • Redshift
          • Create and grant access to Protecto user
          • Add Redshift to Protecto
      • Protecto FAQ's
        • 1. What are the steps after we sign up for a Protecto account?
        • 2. Can I sign up for a free account? How long is the trial period?
        • 3. What is Protecto license key? How can I get a new license key?
        • 4. How do I extend the trial period?
        • 5. What is the Protecto pricing model?
        • 6. How do I cancel my account?
        • 7. How do I unsubscribe / opt-out from emails?
      • Compliance User Guide
        • Risk Identification: Key Definitions
        • Understanding Risks
          • Find assets with severe breach risk
          • Filter assets by breach risk level
          • Find assets with other privacy risks
        • Understanding Usage
          • Find the data assets that were accessed
          • Find the data assets that are not used
        • Add Tags & Classification
          • Add tags globally
          • Classify tags to the categories
          • Add tags with category to the data assets
          • Remove tags with category from the data assets
        • Governance
          • Find all data assets
          • Add/delete purposes
          • Assign data owner for a data asset
          • Add/delete consent, data subject type and location for a data asset
          • Add/update retention time for a data asset
          • Add/update minor data for a data asset
        • Generate Compliance Reports
          • ROPA (Records of Processing Activities)
          • DPIA (Data Protection Impact Assessment)
Powered by GitBook
On this page
  1. Protecto Overview
  2. Introduction
  3. Add new data source
  4. Databricks

Steps to create Azure Databricks Cluster

PreviousAdd Service principal (Azure AD Application) to DatabricksNextSteps to create Databricks python notebook and schedule job

Last updated 1 year ago

  1. Login to Azure Databricks workspace.

  2. Select Data Science & Engineering from sidebar.

  3. Select create -> New cluster. Add below details:

    • Policy - Unrestricted

    • Cluster name - protecto

    • Cluster mode - Standard

    • Databricks runtime version - latest with LTS

    • Enable table access control and only allow Python and SQL commands

    • Worker type - Node Size - 4 Core, 14 GB RAM (Standard_DS3_v2)

    • Driver Type - same as worker

    • In Advance option

Add the following in spark config:

spark.databricks.acl.dfAclsEnabled true

spark data brickss.repl.allowedLanguages python,sql

spark.databricks.delta.preview.enabled true

Note : python notebook and job creation steps will be shared during Protecto product installation. Please find the attached files (protecto_python_notebook).

Credentials needed to connect Databricks:

  • Service principal application id (client id)

  • Service principal directory id (tenant id)

  • Service principal application secret (client secret)

  • Server hostname

  • Port

  • Sql endpoint http path

  • Catalog name (eg: hive_metastore)

5KB
protecto_python_notebook.py