GCP BigQuery + Fount
This documentation explains how to connect Google Cloud Platform (GCP) BigQuery with Python to read table data, convert it into a pandas DataFrame, and train machine learning models using Fount. It includes GCP project setup, BigQuery dataset and table creation, service account configuration, JSON key authentication, package installation, and Python code for securely accessing and processing BigQuery data inside a Jupyter Notebook environment.
Prerequisites
You need:
- GCP Project
- BigQuery Dataset
- BigQuery Table
- Service Account JSON key
Step-by-Step Setup
Create GCP Account
https://cloud.google.com/freeSetup Steps
- Open BigQuery Console
- Create Project
- Create Dataset
- Create Table and upload CSV
- Go to IAM & Admin → Service Accounts
- Create Service Account
- Add BigQuery Admin role
- Create JSON key
- Download service_account.json
Required Packages
pip install pandas google-cloud-bigquery pyarrow db-dtypes fount_core aiohttpPython Code
"""
GCP BigQuery + Fount Integration
This script:
1. Connects to Google BigQuery
2. Reads a BigQuery table into pandas
3. Uploads the DataFrame to Fount
4. Trains a forecasting model using Fount
"""
from google.cloud import bigquery
import pandas as pd
from fount import Fount
# =========================
# GCP CONFIG
# =========================
client_bq = bigquery.Client.from_service_account_json(
"service_account.json"
)
# =========================
# READ DATA FROM BIGQUERY
# =========================
query = """
SELECT *
FROM `inventory-project.inventory_dataset.inventory_table`
"""
df = client_bq.query(query).to_dataframe()
print(df.head())
print(df.shape)
# =========================
# FOUNT TESTING
# =========================
client = Fount()
dataset = client.upload_dataframe(
df,
name="GCP_BigQuery_Dataset"
)
train_job = client.train(
dataset=dataset,
series_id_cols=["category"],
categorical_cols=[
'Seasonality Factors',
'External Factors',
'Demand Trend',
'Customer Segments'
],
model_name="gcp_inventory_model",
date_column="Date",
target_columns=["Sales Quantity"],
validation_data_required=True,
validation_split=0.2,
time_granularity="daily",
machine="ml.g5.12xlarge"
)
print(train_job)BigQuery Invalid Column Names
Avoid:
- spaces
- brackets
- special characters
Recommended:
snake_caseExamples:
sales_quantity
supplier_lead_time_days