Getting Started

Steps to create Azure Databricks Cluster

  1. 1.
    Login to Azure Databricks workspace.
  2. 2.
    Select Data Science & Engineering from sidebar.
  3. 3.
    Select create -> New cluster. Add below details:
    • Policy - Unrestricted
    • Cluster name - protecto
    • Cluster mode - Standard
    • Databricks runtime version - latest with LTS
    • Enable table access control and only allow Python and SQL commands
    • Worker type - Node Size - 4 Core, 14 GB RAM (Standard_DS3_v2)
    • Driver Type - same as worker
    • In Advance option
Add the following in spark config:
spark.databricks.acl.dfAclsEnabled true
spark data brickss.repl.allowedLanguages python,sql true
Note : python notebook and job creation steps will be shared during Protecto product installation. Please find the attached files (protecto_python_notebook).
Credentials needed to connect Databricks:
  • Service principal application id (client id)
  • Service principal directory id (tenant id)
  • Service principal application secret (client secret)
  • Server hostname
  • Port
  • Sql endpoint http path
  • Catalog name (eg: hive_metastore)