Databricks Integration
Updated
by Billy Dowell

Sync candidate data between Great Question and your Databricks warehouse.
Part 1: Databricks Setup
Assumptions:
- There is a single table where candidate data resides
- You have a SQL Warehouse compute resource which we can use to query the table at least once a day
In order to communicate with Databricks, we will need to have a valid OAuth client ID and client secret, along with the appropriate permissions to execute SQL statements on your table table. There are multiple ways of setting this up in Databricks. Two options are given below:
Option A. Enable Great Question as a partner OAuth application
Additional documentation on this option available on Databricks site: Enable or disable partner OAuth applications
- Login to the Databricks account console and click “Settings” in the toolbar
- On the App Connections page, click “Add Connection.”
- Enter the following values:
- Application name - GreatQuestion
- Redirect URLs
https://greatquestion.co/users/auth/databricks/callback
https://staging.greatquestion.co/users/auth/databricks/callback - Access Scopes - SQL
- Generate a Client Secret - checked
- Access token TTL - 60
- Refresh token TTL - 10080
- Click Save. Copy the Client ID and Client Secret! These will be inputted into Great Question next.
Option B. Set up a service principal user account
More information on this option is available on Databricks site: Authorize unattended access to Databricks resources with a service principal using OAuth
- (As an admin with permissions to create service principal users) Navigate to your workspace, click on your user profile on the top-right and go to Settings.
- Click on Identity and access from the Settings menu.
- Click Manage on Service principals.
- Click Add service principal.
- Click Add new. Give the new user a name like GreatQuestion.
- Once created, find the new user in the list, click on the 3 dots menu, and click Edit.
- In the Configuration tab, under Entitlements, ensure Databricks SQL access is checked.
- Under the Secrets tab, click Generate Secret.
- For Lifetime, we recommend the maximum of 730 days for uninterrupted access, but it is up to you. Note that you will need to update Great Question each time a new secret needs to be created.
- Copy the Client ID and Client Secret! These will be inputted into Great Question next.
Part 2: Great Question setup
You will now need to configure the integration within Great Question. As an admin user, click on your name in the top-left corner of Great Question, and go to Settings.
- In Settings, go to Integrations.
- Scroll down until you see the Databricks section.
- Enter your host URL.
This might be something likehttps://dbc-123456.cloud.databricks.com/
if hosted via Databricks, or your own URL if self-hosted. - Enter the Client ID and Client Secret you generated in Part 1 of this guide.
- Click the Connect button. If everything goes well, you should see a confirmation message at the top of the screen, and the Databricks section will look like the following:
- Enter your Warehouse ID. This can be found in your Databricks workspace, under SQL Warehouses. Select an instance which we will use to query your Databricks data. See this screenshot for an example:
- Enter the Catalog, Schema, and Table name in the remaining fields. For example, if your setup looks like this in the Catalog Explorer:
Your information would be:
Catalog - workspace
Schema - default
Table - brix_candidates
- To enable automatic daily import, check Enable pull candidates. To enable automatic daily export, check Enable push candidates.
Note that even if you don't check these boxes, you will only be able to manually import and export data. - Finally, hit Save.
All of the columns in the table will be available to Great Question. Setup a Custom Candidate Attribute for every column you want to see in Great Question. Make sure to select Databricks as the Integration Provider, and enter the column name as the Integration Field Name. The integration field name must match the column name from Databricks exactly.
Frequently Asked Question
- How often do you query our database?
Typically, we perform a sync operation once every 24 hours. However, in some circumstances, such as sync failures or when certain features get updated, we may trigger additional syncs throughout the day. - How do you connect to our system?
We authenticate using OAuth2 access tokens, and then perform SQL queries using Databricks’ REST API. All communication is encrypted in transit using SSL.