Training
Complete guide to training machine learning models using our SDK. Learn how to configure training parameters, execute jobs, monitor progress, and retrieve results.
Training
Train machine learning models with our comprehensive SDK. This guide shows you how to configure, execute, and monitor training jobs using the SDK's training functionality.
Overview
The SDK provides a complete workflow for model training:
- Configure training parameters programmatically
- Execute training jobs synchronously or asynchronously
- Monitor job progress in real-time
- Retrieve training metrics and predictions
train() Method
Creates and configures a new training job using the SDK client.
Syntax
train_job = client.train(
dataset=dataset,
model_name=model_name,
series_id_cols=series_id_cols,
categorical_cols=categorical_cols,
date_column=date_column,
target_columns=target_columns,
validation_data_required=validation_data_required,
validation_split=validation_split,
time_granularity=time_granularity
)Parameters
Required Parameters
dataset (Dataset)
Dataset object returned from the SDK's upload() method
model_name (str)
Descriptive name for your model
series_id_cols (List[str])
List of categorical columns that identify your unique time series. For example of you have 500 different SKUS for prediction. Then this contains combination of columns that uniquely identify a SKU.
categorical_cols (List[str])
List of categorical column names. Including series_id_cols. All categorical columns even if you mentioned categorical identifier earlier still you need to mention all categorical variables here.
date_column (str)
Name of the date/time column
target_columns (List[str])
List of target columns to predict
validation_data_required (bool)
Whether to create a validation set
validation_split (float)
Proportion of data for validation (0.0-1.0)
time_granularity (TIME_GRANULARITY)
Time series granularity ("daily", "weekly", "monthly")
Returns
Returns a TrainingJob object that provides methods to execute and monitor the training process.
TrainingJob Methods
Once you create a training job using the SDK, use these methods to control and monitor it:
run() - Execute Training
Starts the training job execution.
# Synchronous execution (blocks until complete)
success = train_job.run(wait=True, poll_interval=10)
# Asynchronous execution (returns immediately)
train_job.run(wait=False)Parameters:
wait(bool): IfTrue, blocks until job completes. Default:Falsepoll_interval(int): Seconds between status checks when waiting. Default:30
Returns: Boolean indicating success status when wait=True
status() - Check Progress
Retrieves the current job status and progress.
status_info = train_job.status()
print(f"Status: {status_info['status']}")
print(f"Progress: {status_info['progress']}%")Returns: Dictionary containing:
status: Current state ("Pending","Running","Completed","Failed")progress: Completion percentage (0-100)- Additional job-specific fields
metrics() - Get Training Metrics
Retrieves training metrics as a pandas DataFrame.
metrics_df = train_job.metrics()
print(metrics_df.head())Returns: DataFrame with columns such as:
- Training loss
- Validation loss
- Accuracy metrics
- Model-specific metrics
predictions() - Retrieve Predictions
Gets model predictions (available after successful training).
predictions_df = train_job.predictions()Returns: DataFrame containing predicted values
stop() - Cancel Training
Requests cancellation of a running job.
response = train_job.stop()Returns: Dictionary with cancellation status
Complete Example
Here's a full workflow demonstrating how to train a sales forecasting model using the SDK:
from fount import Fount
import pandas as pd
# Initialize the SDK client
client = Fount(api_key="your-api-key") # Upload your dataset using the SDK
df = pd.read_csv("data.csv")
dataset = client.upload_dataframe(
df,
name="Q4_Sales_Data"
) # Create training job using SDK
train_job = client.train(
dataset=dataset,
model_name="Q4_Sales_Forecast",
series_id_cols=["StoreID", "ProductCategory", "Region"]
categorical_cols=["StoreID", "ProductCategory", "Region","Year", "Month"],
date_column="Week",
target_columns=["WeeklySales", "UnitsSold"],
validation_data_required=True,
validation_split=0.2,
time_granularity="weekly"
) # Run training with progress monitoring
import time
print("Starting training...")
train_job = train_job.run(wait=True, poll_interval=30)
success=False
while train_job["status"]!="Completed":
if train_job["status"]=="Completed":
success=True
break
time.sleep(30)
if success:
print("Training completed successfully!")
# Get training metrics
metrics = train_job.metrics()
print("\nTraining Metrics:")
print(metrics.describe())
# Retrieve predictions
predictions = train_job.predictions()
print(f"\nGenerated {len(predictions)} predictions")
else:
print("Training failed.")
```Best Practices
Error Handling
Implement robust error handling when using the SDK:
try:
# Create and run training job using SDK
train_job = client.train(
dataset=dataset,
model_name="Sales_Model",
series_id_cols=["Store", "Product"],
categorical_cols=["Store", "Product","Year","Month"],
date_column="Date",
target_columns=["Sales"],
validation_data_required=True,
validation_split=0.15,
time_granularity="daily"
)
success = train_job.run(wait=True, poll_interval=10)
if not success:
# Check status for error details
status = train_job.status()
print(f"Training failed: {status.get('error_message', 'Unknown error')}")
except Exception as e:
print(f"SDK error during training: {str(e)}")
# Attempt to stop the job if it's still running
try:
train_job.stop()
except:
passSDK Integration Tips
Working with DataFrames
The SDK seamlessly integrates with pandas DataFrames:
import pandas as pd
# Load data into DataFrame
df = pd.read_csv("your_data.csv")
# Upload DataFrame using SDK
dataset = client.upload(
data=df, # Direct DataFrame upload
dataset_name="Processed_Data"
)
# After training, work with results as DataFrames
metrics_df = train_job.metrics()
predictions_df = train_job.predictions()Asynchronous Operations
For production environments, leverage the SDK's async capabilities:
# Start multiple training jobs
jobs = []
for config in training_configs:
job = client.train(**config)
job.run(wait=False) # Non-blocking
jobs.append(job)
# Monitor all jobs
for job in jobs:
while True:
status = job.status()
if status['status'] in ['Completed', 'Failed']:
break
time.sleep(30)Troubleshooting
Configuration Errors
ValidationError: Field required: dataset
dataset was not passed or the variable is None. Upload or reuse a Dataset object first, then pass it to train().
dataset = client.upload_dataframe(df, name="my_data")
print(dataset.id) # confirm not None before calling train()ValidationError: Field required: model_name / model_name must be string
model_name is missing, None, or not a string. Pass a descriptive string:
job = client.train(dataset=dataset, model_name="q4_sales_forecast", ...)ValidationError: extra fields not permitted
The call contains a parameter name that Fount does not recognise. Use only documented SDK parameter names (dataset, model_name, categorical_cols, date_column, target_columns, validation_data_required, validation_split, time_granularity, series_id_cols).
TypeError: train() got an unexpected keyword argument ...
An internal backend field (e.g. s3_bucket_name, device) was passed to the SDK method. Remove any backend-only fields from the call.
Column & Parameter Errors
categorical_cols must be List[str]
Always pass a list, even for a single column:
categorical_cols=["Region"] # correct
categorical_cols="Region" # wrongtarget_columns must be List[str]
Same rule applies to target columns:
target_columns=["Sales"] # correct
target_columns="Sales" # wrongvalidation_data_required must be bool
Use Python booleans, not strings:
validation_data_required=True # correct
validation_data_required="yes" # wrongvalidation_split must be between 0.0 and 1.0 / train or validation set is empty
Use a fraction such as 0.1 or 0.2. Values of 0, 1, or greater than 1 are invalid. For small datasets, reduce the split to avoid empty splits:
# Check you'll have enough rows
print(f"Train rows: {int(len(df) * (1 - 0.2))}")
print(f"Val rows: {int(len(df) * 0.2)}")time_granularity must be one of {...}
Use the exact lowercase strings: second, minute, half_hour, hour, daily, weekly, monthly, quarterly, half_yearly, yearly, decade, non-timeseries. Common mistakes: "day", "week", "month", "none", uppercase variants.
Date column '...' not found in the data
The value passed to date_column does not exactly match a column name in the dataset (case-sensitive, no extra spaces).
print(df.columns.tolist()) # find exact column name
assert "Date" in df.columns # verify before trainingTarget columns [...] not found in the data
One or more values in target_columns are missing from the dataset.
missing = set(target_columns) - set(df.columns)
print("Missing targets:", missing) # must be empty setCategorical columns [...] not found in the data
One or more values in categorical_cols are missing from the dataset.
missing = set(categorical_cols) - set(df.columns)
print("Missing categoricals:", missing)time series granularity but no date column
time_granularity is set to a time-series value (e.g. "daily") but date_column is missing or not usable. Either provide a valid date_column or set time_granularity="non-timeseries" for cross-sectional data.
Target columns list is empty
target_columns=[] was passed. At least one target column is required:
assert len(target_columns) >= 1Data Quality Errors
No input features found in the data
After removing target, date, and categorical columns, there are no remaining model inputs. Add at least one numeric feature column, or check that target_columns and date_column are not accidentally consuming all columns.
date parsing failed or too few valid dates
The date column exists but Fount cannot parse enough valid dates. Standardise the date format before upload (ISO YYYY-MM-DD recommended):
pd.to_datetime(df["Date"], errors="coerce").notna().mean()
# should be close to 1.0Target column must be numeric for regression
The target column contains strings, currency symbols, percentage signs, or blanks. Clean the column before upload:
df["Sales"] = pd.to_numeric(df["Sales"].str.replace(",", "").str.replace("$", ""), errors="coerce")could not convert string to float / can't convert np.ndarray of type numpy.object_
A non-categorical feature contains text values. Add it to categorical_cols if it is truly categorical, or remove it if it is an ID/comment column.
print(df.drop(columns=target_columns + [date_column] + categorical_cols).dtypes)
# all remaining columns should be numericInput contains NaN, infinity or a value too large / loss is nan
Training data has missing or infinite values. Impute or drop them before upload:
print(df.isna().sum())
df = df.fillna(df.median(numeric_only=True))Missing required columns in data chunk
A required column is absent from the uploaded dataset. Re-upload the correct file and ensure all configured columns exist.
No categorical or numerical features provided / empty feature set
The same column was used as both a target and a feature, or all columns were marked as target/date. Keep target_columns, date_column, and feature columns separate:
overlap = set(categorical_cols) & set(target_columns)
print("Overlap:", overlap) # must be emptyTraining very slow / high cardinality categorical feature
A categorical column has too many unique values (e.g. transaction IDs, customer IDs). Remove or bin it:
print(df[categorical_cols].nunique()) # flag columns with thousands of unique valuesTraining failed: Data error / Model did not improve / validation metrics unavailable
Input data quality is insufficient: too few rows, constant target, or no meaningful variation in features. Check data volume and target variance before training.
Found array with 0 sample(s) / num_samples should be a positive integer value
After filtering and splitting, the model receives zero samples. Upload a non-empty dataset and use a validation_split that leaves rows in both train and validation sets.
Job Execution & Monitoring Errors
TypeError: sleep length must be non-negative / poll_interval must be int
poll_interval is invalid. Use a positive integer:
job.run(wait=True, poll_interval=30) # correct — 30 secondsResults not available / Job is not completed / 404 predictions file not found
metrics() or predictions() was called before the job finished. Check status first:
job.run(wait=True) # simplest approach — blocks until done
# or poll manually:
while job.status()["status"] not in ["Completed", "Failed"]:
time.sleep(30)
metrics_df = job.metrics()KeyError: 'status' / status_info is None
The job variable was overwritten with a dictionary or boolean. Keep the SDK job object separate from status dictionaries:
status = job.status() # read into a separate variable
print(status["status"]) # job object itself is unchangedAttributeError: 'bool' object has no attribute 'metrics' / 'dict' object has no attribute 'metrics'
job.run() returns a boolean, not a job object. Do not overwrite job with the return value of run():
success = job.run(wait=True) # correct — store result separately
metrics_df = job.metrics() # call on original job objectCannot stop job in Completed/Failed state / Job already stopped
stop() was called on a job that is no longer running. Only stop Pending or Running jobs:
if job.status()["status"] in ["Pending", "Running"]:
job.stop()Next Steps
After training your model using the SDK:
- Evaluate Performance: Use the SDK's metrics methods to analyze model quality
- Deploy Model: Use the SDK's deployment features for production use
- Monitor Performance: Leverage SDK monitoring capabilities
- Retrain Periodically: Automate retraining workflows with the SDK