Steps to create Azure Databricks Cluster

  1. Login to Azure Databricks workspace.

  2. Select Data Science & Engineering from sidebar.

  3. Select create -> New cluster. Add below details:

    • Policy - Unrestricted

    • Cluster name - protecto

    • Cluster mode - Standard

    • Databricks runtime version - latest with LTS

    • Enable table access control and only allow Python and SQL commands

    • Worker type - Node Size - 4 Core, 14 GB RAM (Standard_DS3_v2)

    • Driver Type - same as worker

    • In Advance option

Add the following in spark config:

spark.databricks.acl.dfAclsEnabled true

spark data brickss.repl.allowedLanguages python,sql true

Note : python notebook and job creation steps will be shared during Protecto product installation. Please find the attached files (protecto_python_notebook).

Credentials needed to connect Databricks:

  • Service principal application id (client id)

  • Service principal directory id (tenant id)

  • Service principal application secret (client secret)

  • Server hostname

  • Port

  • Sql endpoint http path

  • Catalog name (eg: hive_metastore)

Last updated