Steps to create Azure Databricks Cluster
Last updated
Last updated
Login to Azure Databricks workspace.
Select Data Science & Engineering from sidebar.
Select create -> New cluster. Add below details:
Policy - Unrestricted
Cluster name - protecto
Cluster mode - Standard
Databricks runtime version - latest with LTS
Enable table access control and only allow Python and SQL commands
Worker type - Node Size - 4 Core, 14 GB RAM (Standard_DS3_v2)
Driver Type - same as worker
In Advance option
Add the following in spark config:
spark.databricks.acl.dfAclsEnabled true
spark data brickss.repl.allowedLanguages python,sql
spark.databricks.delta.preview.enabled true
Note : python notebook and job creation steps will be shared during Protecto product installation. Please find the attached files (protecto_python_notebook).
Credentials needed to connect Databricks:
Service principal application id (client id)
Service principal directory id (tenant id)
Service principal application secret (client secret)
Server hostname
Port
Sql endpoint http path
Catalog name (eg: hive_metastore)