Databricks + Fount

This documentation explains how to connect Databricks with Python to read table data from a SQL Warehouse, convert it into a pandas DataFrame, and train machine learning models using Fount. It includes Databricks workspace setup, SQL Warehouse configuration, access token generation, required package installation, and Python code for securely connecting and processing data inside a Jupyter Notebook environment.

Prerequisites

You need:

  • Databricks Workspace
  • SQL Warehouse
  • Personal Access Token
  • Uploaded table

Step-by-Step Setup

Create Databricks Account

https://www.databricks.com/

Setup Steps

  • Login to Databricks Workspace
  • Go to Catalog → Create Table
  • Upload CSV file
  • Create table
  • Go to SQL Warehouses
  • Start Serverless Starter Warehouse
  • Open Connection Details
  • Copy Server Hostname and HTTP Path
  • Go to Settings → Developer → Access Tokens
  • Generate token

Required Packages

pip install pandas databricks-sql-connector fount_core aiohttp

Python Code

"""
Databricks + Fount Integration

This script:
1. Connects to Databricks SQL Warehouse
2. Reads a Databricks table into pandas
3. Uploads the DataFrame to Fount
4. Trains a forecasting model using Fount
"""

from databricks import sql
import pandas as pd
from fount import Fount

# =========================
# DATABRICKS CONFIG
# =========================

server_hostname = "YOUR_SERVER_HOSTNAME"

http_path = "YOUR_HTTP_PATH"

access_token = "YOUR_ACCESS_TOKEN"

# =========================
# READ DATA FROM DATABRICKS
# =========================

conn = sql.connect(
    server_hostname=server_hostname,
    http_path=http_path,
    access_token=access_token
)

query = """
SELECT *
FROM workspace.default.inventory_optimization_for_retail_with_aws
"""

df = pd.read_sql(query, conn)

print(df.head())
print(df.shape)

# =========================
# FOUNT TESTING
# =========================

client = Fount()

dataset = client.upload_dataframe(
    df,
    name="Databricks_Dataset"
)

train_job = client.train(
    dataset=dataset,
		series_id_cols=["category"],
    categorical_cols=[
        'Seasonality Factors',
        'External Factors',
        'Demand Trend',
        'Customer Segments'
    ],

    model_name="databricks_inventory_model",

    date_column="Date",

    target_columns=["Sales Quantity"],

    validation_data_required=True,

    validation_split=0.2,

    time_granularity="daily",

    machine="ml.g5.12xlarge"
)

print(train_job)