AWS S3 + Fount

This documentation explains how to connect Amazon Web Services (AWS) S3 storage with Python to access and process CSV files. It includes AWS account setup, S3 bucket creation, IAM user configuration, package installation, and Python code for securely reading files from S3 using Access Keys and Secret Keys.

Prerequisites

You need:

  • AWS Account
  • S3 Bucket
  • Access Key
  • Secret Key
  • CSV file uploaded to S3

Step-by-Step Setup

Create AWS Account

https://aws.amazon.com/

Setup Steps

  • Login to AWS Console
  • Search for S3 in the search bar
  • Create Bucket → Enter unique bucket name → Create
  • Upload CSV file into the bucket
  • Search IAM → Users → Create User
  • Attach permission: AmazonS3FullAccess
  • Go to Security Credentials → Create Access Key
  • Copy Access Key ID and Secret Access Key
  • Use them in Python code

Required Packages

pip install pandas boto3 fount_core aiohttp

Python Code

"""
AWS S3 + Fount Integration

This script:
1. Connects to AWS S3
2. Reads a CSV file from an S3 bucket
3. Converts the data into a pandas DataFrame
4. Uploads the DataFrame to Fount
5. Trains a forecasting model using Fount
"""

import boto3
import pandas as pd
from fount import Fount

# =========================
# AWS CONFIG
# =========================

aws_access_key_id = "YOUR_ACCESS_KEY"

aws_secret_access_key = "YOUR_SECRET_KEY"

bucket_name = "your-bucket-name"

file_key = "inventory_data.csv"

# =========================
# READ DATA FROM AWS S3
# =========================

s3 = boto3.client(
    's3',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)

obj = s3.get_object(
    Bucket=bucket_name,
    Key=file_key
)

df = pd.read_csv(obj['Body'])

print(df.head())
print(df.shape)

# =========================
# FOUNT TESTING
# =========================

client = Fount()

dataset = client.upload_dataframe(
    df,
    name="AWS_S3_Dataset"
)

train_job = client.train(
    dataset=dataset,
		series_id_cols=["category"],
    categorical_cols=[
        'Seasonality Factors',
        'External Factors',
        'Demand Trend',
        'Customer Segments'
    ],

    model_name="aws_inventory_model",

    date_column="Date",

    target_columns=["Sales Quantity"],

    validation_data_required=True,

    validation_split=0.2,

    time_granularity="daily",

    machine="ml.g5.12xlarge"
)

print(train_job)