Azure Blob Storage + Fount

This documentation explains how to connect Microsoft Azure Blob Storage with Python to access CSV files stored in a Blob Container, convert the data into a pandas DataFrame, and train machine learning models using Fount. It includes Azure Storage Account setup, container creation, connection string configuration, package installation, and Python code for securely downloading and processing data inside a Jupyter Notebook environment.

Prerequisites

You need:

  • Azure Subscription
  • Storage Account
  • Blob Container
  • Connection String
  • Uploaded CSV file

Step-by-Step Setup

Create Azure Account

https://azure.microsoft.com/free/

Setup Steps

  • Login to Azure Portal
  • Search Storage Accounts
  • Create Storage Account
  • Go to Containers → Create Container
  • Upload CSV file
  • Go to Access Keys
  • Copy Connection String

Required Packages

pip install pandas azure-storage-blob fount_core aiohttp

Python Code

"""
Azure Blob Storage + Fount Integration

This script:
1. Connects to Azure Blob Storage
2. Downloads a CSV file from Blob Storage
3. Converts the file into a pandas DataFrame
4. Uploads the DataFrame to Fount
5. Trains a forecasting model using Fount
"""

from azure.storage.blob import BlobServiceClient
import pandas as pd
from io import StringIO
from fount import Fount

# =========================
# AZURE CONFIG
# =========================

connection_string = "YOUR_CONNECTION_STRING"

container_name = "inventory-data"

blob_name = "inventory_data.csv"

# =========================
# READ DATA FROM AZURE
# =========================

blob_service_client = BlobServiceClient.from_connection_string(
    connection_string
)

blob_client = blob_service_client.get_blob_client(
    container=container_name,
    blob=blob_name
)

data = blob_client.download_blob().readall()

df = pd.read_csv(
    StringIO(data.decode('utf-8'))
)

print(df.head())
print(df.shape)

# =========================
# FOUNT TESTING
# =========================

client = Fount()

dataset = client.upload_dataframe(
    df,
    name="Azure_Dataset"
)

train_job = client.train(
    dataset=dataset,
		series_id_cols=["category"],
    categorical_cols=[
        'Seasonality Factors',
        'External Factors',
        'Demand Trend',
        'Customer Segments'
    ],

    model_name="azure_inventory_model",

    date_column="Date",

    target_columns=["Sales Quantity"],

    validation_data_required=True,

    validation_split=0.2,

    time_granularity="daily",

    machine="ml.g5.12xlarge"
)

print(train_job)