Training

Complete guide to training machine learning models using our SDK. Learn how to configure training parameters, execute jobs, monitor progress, and retrieve results.

Training

Train machine learning models with our comprehensive SDK. This guide shows you how to configure, execute, and monitor training jobs using the SDK's training functionality.

Overview

The SDK provides a complete workflow for model training:

  • Configure training parameters programmatically
  • Execute training jobs synchronously or asynchronously
  • Monitor job progress in real-time
  • Retrieve training metrics and predictions

Key Features

  • Support in series identification
  • Support for categorical and time-series data
  • Flexible validation strategies
  • Real-time progress monitoring
  • Comprehensive metrics reporting

Common Use Cases

  • Multi Timeseries forcasting
  • Sales forecasting
  • Demand prediction
  • Time-series analysis
  • Multi-target training

train() Method

Creates and configures a new training job using the SDK client.

Syntax

train_job = client.train(
  dataset=dataset,
  model_name=model_name,
  series_id_cols=series_id_cols,
  categorical_cols=categorical_cols,
  date_column=date_column,
  target_columns=target_columns,
  validation_data_required=validation_data_required,
  validation_split=validation_split,
  time_granularity=time_granularity
)

Parameters

Required Parameters

dataset (Dataset)
Dataset object returned from the SDK's upload() method

model_name (str)
Descriptive name for your model

series_id_cols (List[str])
List of categorical columns that identify your unique time series. For example of you have 500 different SKUS for prediction. Then this contains combination of columns that uniquely identify a SKU.

categorical_cols (List[str])
List of categorical column names. Including series_id_cols. All categorical columns even if you mentioned categorical identifier earlier still you need to mention all categorical variables here.

date_column (str)
Name of the date/time column

target_columns (List[str])
List of target columns to predict

validation_data_required (bool)
Whether to create a validation set

validation_split (float)
Proportion of data for validation (0.0-1.0)

time_granularity (TIME_GRANULARITY)
Time series granularity ("daily", "weekly", "monthly")

Returns

Returns a TrainingJob object that provides methods to execute and monitor the training process.

TrainingJob Methods

Once you create a training job using the SDK, use these methods to control and monitor it:

playrun() - Execute Training

Starts the training job execution.

# Synchronous execution (blocks until complete)
success = train_job.run(wait=True, poll_interval=10)

# Asynchronous execution (returns immediately)
train_job.run(wait=False)

Parameters:

  • wait (bool): If True, blocks until job completes. Default: False
  • poll_interval (int): Seconds between status checks when waiting. Default: 30

Returns: Boolean indicating success status when wait=True

chart-linestatus() - Check Progress

Retrieves the current job status and progress.

status_info = train_job.status()
print(f"Status: {status_info['status']}")
print(f"Progress: {status_info['progress']}%")

Returns: Dictionary containing:

  • status: Current state ("Pending", "Running", "Completed", "Failed")
  • progress: Completion percentage (0-100)
  • Additional job-specific fields
chart-barmetrics() - Get Training Metrics

Retrieves training metrics as a pandas DataFrame.

metrics_df = train_job.metrics()
print(metrics_df.head())

Returns: DataFrame with columns such as:

  • Training loss
  • Validation loss
  • Accuracy metrics
  • Model-specific metrics
bullseyepredictions() - Retrieve Predictions

Gets model predictions (available after successful training).

predictions_df = train_job.predictions()

Returns: DataFrame containing predicted values

stopstop() - Cancel Training

Requests cancellation of a running job.

response = train_job.stop()

Returns: Dictionary with cancellation status

Complete Example

Here's a full workflow demonstrating how to train a sales forecasting model using the SDK:

code
Step 1: Initialize SDK Client
   from fount import Fount
   import pandas as pd

   # Initialize the SDK client
   client = Fount(api_key="your-api-key")
database
Step 2: Upload Data
   # Upload your dataset using the SDK
   df = pd.read_csv("data.csv")
   dataset = client.upload_dataframe(
       df,
       name="Q4_Sales_Data"
   )
cog
Step 3: Configure Training
   # Create training job using SDK
   train_job = client.train(
       dataset=dataset,
       model_name="Q4_Sales_Forecast",
       series_id_cols=["StoreID", "ProductCategory", "Region"]
       categorical_cols=["StoreID", "ProductCategory", "Region","Year", "Month"],
       date_column="Week",
       target_columns=["WeeklySales", "UnitsSold"],
       validation_data_required=True,
       validation_split=0.2,
       time_granularity="weekly"
   )
rocket
Step 4: Execute & Monitor
    # Run training with progress monitoring
		import time
    print("Starting training...")
    train_job = train_job.run(wait=True, poll_interval=30)
    success=False
    while train_job["status"]!="Completed":
		    if train_job["status"]=="Completed":
            success=True
            break
        time.sleep(30)
    if success:
        print("Training completed successfully!")
        
        # Get training metrics
        metrics = train_job.metrics()
        print("\nTraining Metrics:")
        print(metrics.describe())
        
        # Retrieve predictions
        predictions = train_job.predictions()
        print(f"\nGenerated {len(predictions)} predictions")
    else:
        print("Training failed.")
    ```

Best Practices

SDK Usage

  • Always handle SDK exceptions appropriately
  • Use context managers when available
  • Store credentials securely
  • Implement proper logging

Data Preparation

  • Ensure date columns use consistent formats
  • Handle missing values before training
  • Verify categorical columns contain valid categories

Performance Tips

  • Start with smaller datasets for testing
  • Use asynchronous execution for long-running jobs
  • Set appropriate poll intervals to balance responsiveness

Error Handling

Implement robust error handling when using the SDK:

try:
    # Create and run training job using SDK
    train_job = client.train(
        dataset=dataset,
        model_name="Sales_Model",
        series_id_cols=["Store", "Product"],
        categorical_cols=["Store", "Product","Year","Month"],
        date_column="Date",
        target_columns=["Sales"],
        validation_data_required=True,
        validation_split=0.15,
        time_granularity="daily"
    )
    
    success = train_job.run(wait=True, poll_interval=10)
    
    if not success:
        # Check status for error details
        status = train_job.status()
        print(f"Training failed: {status.get('error_message', 'Unknown error')}")
        
except Exception as e:
    print(f"SDK error during training: {str(e)}")
    # Attempt to stop the job if it's still running
    try:
        train_job.stop()
    except:
        pass

SDK Integration Tips

Working with DataFrames

The SDK seamlessly integrates with pandas DataFrames:

import pandas as pd

# Load data into DataFrame
df = pd.read_csv("your_data.csv")

# Upload DataFrame using SDK
dataset = client.upload(
    data=df,  # Direct DataFrame upload
    dataset_name="Processed_Data"
)

# After training, work with results as DataFrames
metrics_df = train_job.metrics()
predictions_df = train_job.predictions()

Asynchronous Operations

For production environments, leverage the SDK's async capabilities:

# Start multiple training jobs
jobs = []
for config in training_configs:
    job = client.train(**config)
    job.run(wait=False)  # Non-blocking
    jobs.append(job)

# Monitor all jobs
for job in jobs:
    while True:
        status = job.status()
        if status['status'] in ['Completed', 'Failed']:
            break
        time.sleep(30)

Troubleshooting

Configuration Errors

ValidationError: Field required: dataset

dataset was not passed or the variable is None. Upload or reuse a Dataset object first, then pass it to train().

dataset = client.upload_dataframe(df, name="my_data")
print(dataset.id)  # confirm not None before calling train()

ValidationError: Field required: model_name / model_name must be string

model_name is missing, None, or not a string. Pass a descriptive string:

job = client.train(dataset=dataset, model_name="q4_sales_forecast", ...)

ValidationError: extra fields not permitted

The call contains a parameter name that Fount does not recognise. Use only documented SDK parameter names (dataset, model_name, categorical_cols, date_column, target_columns, validation_data_required, validation_split, time_granularity, series_id_cols).

TypeError: train() got an unexpected keyword argument ...

An internal backend field (e.g. s3_bucket_name, device) was passed to the SDK method. Remove any backend-only fields from the call.

Column & Parameter Errors

categorical_cols must be List[str]

Always pass a list, even for a single column:

categorical_cols=["Region"]   # correct
categorical_cols="Region"     # wrong

target_columns must be List[str]

Same rule applies to target columns:

target_columns=["Sales"]      # correct
target_columns="Sales"        # wrong

validation_data_required must be bool

Use Python booleans, not strings:

validation_data_required=True   # correct
validation_data_required="yes"  # wrong

validation_split must be between 0.0 and 1.0 / train or validation set is empty

Use a fraction such as 0.1 or 0.2. Values of 0, 1, or greater than 1 are invalid. For small datasets, reduce the split to avoid empty splits:

# Check you'll have enough rows
print(f"Train rows: {int(len(df) * (1 - 0.2))}")
print(f"Val rows:   {int(len(df) * 0.2)}")

time_granularity must be one of {...}

Use the exact lowercase strings: second, minute, half_hour, hour, daily, weekly, monthly, quarterly, half_yearly, yearly, decade, non-timeseries. Common mistakes: "day", "week", "month", "none", uppercase variants.

Date column '...' not found in the data

The value passed to date_column does not exactly match a column name in the dataset (case-sensitive, no extra spaces).

print(df.columns.tolist())            # find exact column name
assert "Date" in df.columns           # verify before training

Target columns [...] not found in the data

One or more values in target_columns are missing from the dataset.

missing = set(target_columns) - set(df.columns)
print("Missing targets:", missing)    # must be empty set

Categorical columns [...] not found in the data

One or more values in categorical_cols are missing from the dataset.

missing = set(categorical_cols) - set(df.columns)
print("Missing categoricals:", missing)

time series granularity but no date column

time_granularity is set to a time-series value (e.g. "daily") but date_column is missing or not usable. Either provide a valid date_column or set time_granularity="non-timeseries" for cross-sectional data.

Target columns list is empty

target_columns=[] was passed. At least one target column is required:

assert len(target_columns) >= 1

Data Quality Errors

No input features found in the data

After removing target, date, and categorical columns, there are no remaining model inputs. Add at least one numeric feature column, or check that target_columns and date_column are not accidentally consuming all columns.

date parsing failed or too few valid dates

The date column exists but Fount cannot parse enough valid dates. Standardise the date format before upload (ISO YYYY-MM-DD recommended):

pd.to_datetime(df["Date"], errors="coerce").notna().mean()
# should be close to 1.0

Target column must be numeric for regression

The target column contains strings, currency symbols, percentage signs, or blanks. Clean the column before upload:

df["Sales"] = pd.to_numeric(df["Sales"].str.replace(",", "").str.replace("$", ""), errors="coerce")

could not convert string to float / can't convert np.ndarray of type numpy.object_

A non-categorical feature contains text values. Add it to categorical_cols if it is truly categorical, or remove it if it is an ID/comment column.

print(df.drop(columns=target_columns + [date_column] + categorical_cols).dtypes)
# all remaining columns should be numeric

Input contains NaN, infinity or a value too large / loss is nan

Training data has missing or infinite values. Impute or drop them before upload:

print(df.isna().sum())
df = df.fillna(df.median(numeric_only=True))

Missing required columns in data chunk

A required column is absent from the uploaded dataset. Re-upload the correct file and ensure all configured columns exist.

No categorical or numerical features provided / empty feature set

The same column was used as both a target and a feature, or all columns were marked as target/date. Keep target_columns, date_column, and feature columns separate:

overlap = set(categorical_cols) & set(target_columns)
print("Overlap:", overlap)  # must be empty

Training very slow / high cardinality categorical feature

A categorical column has too many unique values (e.g. transaction IDs, customer IDs). Remove or bin it:

print(df[categorical_cols].nunique())  # flag columns with thousands of unique values

Training failed: Data error / Model did not improve / validation metrics unavailable

Input data quality is insufficient: too few rows, constant target, or no meaningful variation in features. Check data volume and target variance before training.

Found array with 0 sample(s) / num_samples should be a positive integer value

After filtering and splitting, the model receives zero samples. Upload a non-empty dataset and use a validation_split that leaves rows in both train and validation sets.

Job Execution & Monitoring Errors

TypeError: sleep length must be non-negative / poll_interval must be int

poll_interval is invalid. Use a positive integer:

job.run(wait=True, poll_interval=30)  # correct — 30 seconds

Results not available / Job is not completed / 404 predictions file not found

metrics() or predictions() was called before the job finished. Check status first:

job.run(wait=True)   # simplest approach — blocks until done
# or poll manually:
while job.status()["status"] not in ["Completed", "Failed"]:
    time.sleep(30)
metrics_df = job.metrics()

KeyError: 'status' / status_info is None

The job variable was overwritten with a dictionary or boolean. Keep the SDK job object separate from status dictionaries:

status = job.status()   # read into a separate variable
print(status["status"]) # job object itself is unchanged

AttributeError: 'bool' object has no attribute 'metrics' / 'dict' object has no attribute 'metrics'

job.run() returns a boolean, not a job object. Do not overwrite job with the return value of run():

success = job.run(wait=True)   # correct — store result separately
metrics_df = job.metrics()      # call on original job object

Cannot stop job in Completed/Failed state / Job already stopped

stop() was called on a job that is no longer running. Only stop Pending or Running jobs:

if job.status()["status"] in ["Pending", "Running"]:
    job.stop()

Next Steps

After training your model using the SDK:

  1. Evaluate Performance: Use the SDK's metrics methods to analyze model quality
  2. Deploy Model: Use the SDK's deployment features for production use
  3. Monitor Performance: Leverage SDK monitoring capabilities
  4. Retrain Periodically: Automate retraining workflows with the SDK