Getting Started
  • Protecto Overview
    • Introduction
      • Quickstart Guide
      • Protecto Vault
        • What is a token?
        • Token customization
        • Authentication
        • Tokenization APIs
          • Masking
            • Mask with token
            • Mask with format and token
            • Identify and mask (Auto-detect)
          • Unmasking
          • What happens if an API fails?
        • Asynchronous API's
        • Bulk data
      • Add new data source
        • Snowflake
          • Create and grant access to Protecto
          • Add Snowflake to Protecto
        • Salesforce
          • Create connected app and user
            • Steps to create connected app
            • Steps to create Protecto user
          • Add Salesforce to Protecto
        • Azure SQL
          • Connect using AD Application credentials
          • Connect using database user credentials
        • Databricks
          • Add Service principal (Azure AD Application) to Databricks
          • Steps to create Azure Databricks Cluster
          • Steps to create Databricks python notebook and schedule job
        • Redshift
          • Create and grant access to Protecto user
          • Add Redshift to Protecto
      • Protecto FAQ's
        • 1. What are the steps after we sign up for a Protecto account?
        • 2. Can I sign up for a free account? How long is the trial period?
        • 3. What is Protecto license key? How can I get a new license key?
        • 4. How do I extend the trial period?
        • 5. What is the Protecto pricing model?
        • 6. How do I cancel my account?
        • 7. How do I unsubscribe / opt-out from emails?
      • Compliance User Guide
        • Risk Identification: Key Definitions
        • Understanding Risks
          • Find assets with severe breach risk
          • Filter assets by breach risk level
          • Find assets with other privacy risks
        • Understanding Usage
          • Find the data assets that were accessed
          • Find the data assets that are not used
        • Add Tags & Classification
          • Add tags globally
          • Classify tags to the categories
          • Add tags with category to the data assets
          • Remove tags with category from the data assets
        • Governance
          • Find all data assets
          • Add/delete purposes
          • Assign data owner for a data asset
          • Add/delete consent, data subject type and location for a data asset
          • Add/update retention time for a data asset
          • Add/update minor data for a data asset
        • Generate Compliance Reports
          • ROPA (Records of Processing Activities)
          • DPIA (Data Protection Impact Assessment)
Powered by GitBook
On this page
  1. Protecto Overview
  2. Introduction
  3. Add new data source
  4. Databricks

Steps to create Databricks python notebook and schedule job

PreviousSteps to create Azure Databricks ClusterNextRedshift

Last updated 1 year ago

Step I: Create Databricks python notebook

  1. Login to Databricks as an Admin user.

  2. In Data Science and Engineering section, click Create -> Notebook.

  3. Provide the following details for notebook:

    • Name

    • Set default language to Python

    • Cluster - the one that was created for Protecto in previous step

  4. Inside the notebook, copy and paste the python code snippet shared with this document.

  5. Click Schedule on the top right corner.

Step II: Schedule job to run on notebook

Fill all the details like:

  1. Job name

  2. Schedule - Choose scheduled and configure it to run Every Day once at 1:00 with (UTC+00:00) UTC Timezone.

  3. Cluster - Choose the cluster created for Protecto in previous steps.

  4. Parameters - Add two parameters:

    • Key: host, value: <Databricks instance domain>

    • Key: token, value: <Admin Personal Access Token> (The PAT should have admin access). We are using this token for getting users and group mapping. The mapping can be extracted only with the Access token of an admin user.

  5. Alerts - Provide help@protecto.ai for both Success and Failure Alerts.