Azure Blob Storage + Fount
This documentation explains how to connect Microsoft Azure Blob Storage with Python to access CSV files stored in a Blob Container, convert the data into a pandas DataFrame, and train machine learning models using Fount. It includes Azure Storage Account setup, container creation, connection string configuration, package installation, and Python code for securely downloading and processing data inside a Jupyter Notebook environment.
Prerequisites
You need:
- Azure Subscription
- Storage Account
- Blob Container
- Connection String
- Uploaded CSV file
Step-by-Step Setup
Create Azure Account
https://azure.microsoft.com/free/Setup Steps
- Login to Azure Portal
- Search Storage Accounts
- Create Storage Account
- Go to Containers → Create Container
- Upload CSV file
- Go to Access Keys
- Copy Connection String
Required Packages
pip install pandas azure-storage-blob fount_core aiohttpPython Code
"""
Azure Blob Storage + Fount Integration
This script:
1. Connects to Azure Blob Storage
2. Downloads a CSV file from Blob Storage
3. Converts the file into a pandas DataFrame
4. Uploads the DataFrame to Fount
5. Trains a forecasting model using Fount
"""
from azure.storage.blob import BlobServiceClient
import pandas as pd
from io import StringIO
from fount import Fount
# =========================
# AZURE CONFIG
# =========================
connection_string = "YOUR_CONNECTION_STRING"
container_name = "inventory-data"
blob_name = "inventory_data.csv"
# =========================
# READ DATA FROM AZURE
# =========================
blob_service_client = BlobServiceClient.from_connection_string(
connection_string
)
blob_client = blob_service_client.get_blob_client(
container=container_name,
blob=blob_name
)
data = blob_client.download_blob().readall()
df = pd.read_csv(
StringIO(data.decode('utf-8'))
)
print(df.head())
print(df.shape)
# =========================
# FOUNT TESTING
# =========================
client = Fount()
dataset = client.upload_dataframe(
df,
name="Azure_Dataset"
)
train_job = client.train(
dataset=dataset,
series_id_cols=["category"],
categorical_cols=[
'Seasonality Factors',
'External Factors',
'Demand Trend',
'Customer Segments'
],
model_name="azure_inventory_model",
date_column="Date",
target_columns=["Sales Quantity"],
validation_data_required=True,
validation_split=0.2,
time_granularity="daily",
machine="ml.g5.12xlarge"
)
print(train_job)