Getting Started
Search
K

Steps to create Databricks python notebook and schedule job

Step I: Create Databricks python notebook
  1. 1.
    Login to Databricks as an Admin user.
  2. 2.
    In Data Science and Engineering section, click Create -> Notebook.
  3. 3.
    Provide the following details for notebook:
    • Name
    • Set default language to Python
    • Cluster - the one that was created for Protecto in previous step
  4. 4.
    Inside the notebook, copy and paste the python code snippet shared with this document.
  5. 5.
    Click Schedule on the top right corner.
Step II: Schedule job to run on notebook
Fill all the details like:
  1. 1.
    Job name
  2. 2.
    Schedule - Choose scheduled and configure it to run Every Day once at 1:00 with (UTC+00:00) UTC Timezone.
  3. 3.
    Cluster - Choose the cluster created for Protecto in previous steps.
  4. 4.
    Parameters - Add two parameters:
    • Key: host, value: <Databricks instance domain>
    • Key: token, value: <Admin Personal Access Token> (The PAT should have admin access). We are using this token for getting users and group mapping. The mapping can be extracted only with the Access token of an admin user.
  5. 5.
    Alerts - Provide [email protected] for both Success and Failure Alerts.