Steps to create Databricks python notebook and schedule job
Last updated
Last updated
Step I: Create Databricks python notebook
Login to Databricks as an Admin user.
In Data Science and Engineering section, click Create -> Notebook.
Provide the following details for notebook:
Set default language to Python
Cluster - the one that was created for Protecto in previous step
Inside the notebook, copy and paste the python code snippet shared with this document.
Click Schedule on the top right corner.
Step II: Schedule job to run on notebook
Fill all the details like:
Job name
Schedule - Choose scheduled and configure it to run Every Day once at 1:00 with (UTC+00:00) UTC Timezone.
Cluster - Choose the cluster created for Protecto in previous steps.
Parameters - Add two parameters:
Key: host, value: <Databricks instance domain>
Key: token, value: <Admin Personal Access Token> (The PAT should have admin access). We are using this token for getting users and group mapping. The mapping can be extracted only with the Access token of an admin user.
Alerts - Provide for both Success and Failure Alerts.