GCP BigQuery + Fount

This documentation explains how to connect Google Cloud Platform (GCP) BigQuery with Python to read table data, convert it into a pandas DataFrame, and train machine learning models using Fount. It includes GCP project setup, BigQuery dataset and table creation, service account configuration, JSON key authentication, package installation, and Python code for securely accessing and processing BigQuery data inside a Jupyter Notebook environment.

Prerequisites

You need:

  • GCP Project
  • BigQuery Dataset
  • BigQuery Table
  • Service Account JSON key

Step-by-Step Setup

Create GCP Account

https://cloud.google.com/free

Setup Steps

  • Open BigQuery Console
  • Create Project
  • Create Dataset
  • Create Table and upload CSV
  • Go to IAM & Admin → Service Accounts
  • Create Service Account
  • Add BigQuery Admin role
  • Create JSON key
  • Download service_account.json

Required Packages

pip install pandas google-cloud-bigquery pyarrow db-dtypes fount_core aiohttp

Python Code

"""
GCP BigQuery + Fount Integration

This script:
1. Connects to Google BigQuery
2. Reads a BigQuery table into pandas
3. Uploads the DataFrame to Fount
4. Trains a forecasting model using Fount
"""

from google.cloud import bigquery
import pandas as pd
from fount import Fount

# =========================
# GCP CONFIG
# =========================

client_bq = bigquery.Client.from_service_account_json(
    "service_account.json"
)

# =========================
# READ DATA FROM BIGQUERY
# =========================

query = """
SELECT *
FROM `inventory-project.inventory_dataset.inventory_table`
"""

df = client_bq.query(query).to_dataframe()

print(df.head())
print(df.shape)

# =========================
# FOUNT TESTING
# =========================

client = Fount()

dataset = client.upload_dataframe(
    df,
    name="GCP_BigQuery_Dataset"
)

train_job = client.train(
    dataset=dataset,
		series_id_cols=["category"],
    categorical_cols=[
        'Seasonality Factors',
        'External Factors',
        'Demand Trend',
        'Customer Segments'
    ],

    model_name="gcp_inventory_model",

    date_column="Date",

    target_columns=["Sales Quantity"],

    validation_data_required=True,

    validation_split=0.2,

    time_granularity="daily",

    machine="ml.g5.12xlarge"
)

print(train_job)

BigQuery Invalid Column Names

Avoid:

  • spaces
  • brackets
  • special characters

Recommended:

snake_case

Examples:

sales_quantity
supplier_lead_time_days