Snowflake + Fount
This documentation explains how to connect Snowflake with Python to read data from Snowflake tables, convert it into a pandas DataFrame, and train machine learning models using Fount. It includes Snowflake account setup, warehouse and database configuration, table creation, package installation, authentication setup, and Python code for securely accessing and processing data inside a Jupyter Notebook environment.
Prerequisites
You need:
- Snowflake Account
- Warehouse
- Database
- Schema
- Uploaded table
Step-by-Step Setup
Create Snowflake Account
https://signup.snowflake.com/Setup Steps
- Create Database
- Create Schema
- Create Warehouse
- Upload CSV file as table
- Copy Account Identifier from URL
- Use Snowflake login credentials in Python
Required Packages
pip install pandas snowflake-connector-python fount_core aiohttpPython Code
"""
Snowflake + Fount Integration
This script:
1. Connects to Snowflake Warehouse
2. Reads a Snowflake table into pandas
3. Uploads the DataFrame to Fount
4. Trains a forecasting model using Fount
"""
import snowflake.connector
import pandas as pd
from fount import Fount
# =========================
# SNOWFLAKE CONFIG
# =========================
conn = snowflake.connector.connect(
user='YOUR_USERNAME',
password='YOUR_PASSWORD',
account='YOUR_ACCOUNT_IDENTIFIER',
warehouse='COMPUTE_WH',
database='TEST_DB',
schema='PUBLIC'
)
# =========================
# READ DATA FROM SNOWFLAKE
# =========================
query = """
SELECT *
FROM INVENTORY_DATA
"""
df = pd.read_sql(query, conn)
print(df.head())
print(df.shape)
# =========================
# FOUNT TESTING
# =========================
client = Fount()
dataset = client.upload_dataframe(
df,
name="Snowflake_Dataset"
)
train_job = client.train(
dataset=dataset,
series_id_cols=["category"],
categorical_cols=[
'Seasonality Factors',
'External Factors',
'Demand Trend',
'Customer Segments'
],
model_name="snowflake_inventory_model",
date_column="Date",
target_columns=["Sales Quantity"],
validation_data_required=True,
validation_split=0.2,
time_granularity="daily",
machine="ml.g5.12xlarge"
)
print(train_job)