EDA References

Tips, patterns, cheat sheets, and a full copy-paste script for Fount EDA.

Tips and FAQs

Use everything via fount_eda.*

Call all functions directly through your fount_eda instance, for example, fount_eda.read_csv, fount_eda.DataFrame, fount_eda.array, fount_eda.train_test_split. There's no need to use fount_eda.pd.read_csv or reach into submodules manually.

Finding functions quickly

Autocomplete in your IDE will surface available functions after initializing fount_eda. You can also search for a function by pattern:

fount_eda.list("pattern")

Priority and naming collisions

Fount EDA resolves naming conflicts by preferring pandas, then NumPy, then scikit-learn by default. If you want NumPy names to take priority, set this explicitly:

fount_eda.set_priority(["numpy", "pandas", "sklearn"])

Common error messages

  • "Tool not installed": install the missing dependency (pandas, numpy, or scikit-learn) and retry
  • "Function not found": check for typos; the error message may suggest the closest valid function name

Keeping things simple

Use small, obvious steps: read → inspect → clean lightly → EDA → split → baseline model. Only add complexity if the EDA results highlight a specific need, such as heavy class imbalance or a high proportion of rare categories.

Cheat sheets

DataFrames

OperationCode
Read CSVfount_eda.read_csv("file.csv")
Read TSVfount_eda.read_csv("file.tsv", sep="\t")
Read Excelfount_eda.read_excel("file.xlsx")
Read JSONfount_eda.read_json("file.json", orient="records")
Create DataFramefount_eda.DataFrame({...})
Select columnsdf[["col1", "col2"]]
Filter rowsdf[df["col"] > value]
Drop columndf.drop(columns=["col"], errors="ignore")
Sortdf.sort_values("col")
Mergefount_eda.merge(left, right, how="left", on="key")
Concat rowsfount_eda.concat([df1, df2], axis=0)
Concat colsfount_eda.concat([df1, df2], axis=1)
Describedf.describe()
Value countsdf["col"].value_counts()
Parse datesfount_eda.to_datetime(df["date"], errors="coerce")
Save CSVdf.to_csv("out.csv", index=False)
Save Exceldf.to_excel("out.xlsx", index=False)

Arrays

OperationCode
Createfount_eda.array([...])
Rangefount_eda.arange(start, stop, step)
Linspacefount_eda.linspace(start, stop, n)
Logspacefount_eda.logspace(start, stop, n)
Zeros / Onesfount_eda.zeros(shape) / fount_eda.ones(shape)
Eye / Diagfount_eda.eye(n) / fount_eda.diag([...])
Mean / Medianfount_eda.mean(x) / fount_eda.median(x)
Std / Varfount_eda.std(x) / fount_eda.var(x)
Percentilefount_eda.percentile(x, 90)
Quantilefount_eda.quantile(x, 0.9)
Clipfount_eda.clip(x, 0, 1)
Concatenatefount_eda.concatenate([x, y])
Vstack / Hstackfount_eda.vstack([a, b]) / fount_eda.hstack([a, b])
Random floatsfount_eda.random.random(n)
Random intsfount_eda.random.integers(low, high, size=n)
Normfount_eda.linalg.norm(v)
FFTfount_eda.fft.fft(signal)

Machine learning

OperationCode
Train/test splitfount_eda.train_test_split(X, y, test_size=0.2, random_state=42)
One-hot encoderfount_eda.OneHotEncoder(handle_unknown="ignore", sparse_output=False)
Column transformerfount_eda.ColumnTransformer([...])
Pipelinefount_eda.Pipeline([...])
Random Forestfount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0)
Evaluate (R²)reg.score(X_te, y_te)

EDA suite

MethodUse case
fount_eda.eda(df, target=..., datetime_col=...)Programmatic result object
fount_eda.eda_md(df, target=..., datetime_col=...)Markdown string
fount_eda.eda_print(df, target=..., datetime_col=...)Console print
fount_eda.eda_report_and_score(df)Full report and score
rep.suitability_score0–100 model-readiness score
rep.recommendationsSuggested improvements

End-to-end script

Copy this into example_fount_eda.py and run it against your own data.csv. Assumes a numeric target column "Units" and an optional "date" column.

# example_fount_eda.py
from fount_eda import EDA

fount_eda = EDA()

# 1. Load data
df = fount_eda.read_csv("data.csv")

# 2. Light cleanup
df = df.drop(columns=["unused"], errors="ignore")
df = df[df["Units"].notna()]

# 3. EDA — choose one or run all three for comparison
rep = fount_eda.eda(df, target="Units", datetime_col="date")
print("Suitability:", rep.suitability_score)

md = fount_eda.eda_md(df, target="Units", datetime_col="date")
with open("eda_report.md", "w", encoding="utf-8") as f:
    f.write(md)

print("\n--- Printed EDA ---")
fount_eda.eda_print(df, target="Units", datetime_col="date")

# 4. Baseline model
X = df.drop(columns=["Units"], errors="ignore")
y = df["Units"]

X_tr, X_te, y_tr, y_te = fount_eda.train_test_split(X, y, test_size=0.2, random_state=42)
reg = fount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0).fit(X_tr, y_tr)
print("Test R²:", reg.score(X_te, y_te))

# 5. Save cleaned data
df.to_csv("cleaned.csv", index=False)

Practical notes

  • Use clear, descriptive column names, they make EDA output and model results easier to interpret
  • Keep your target numeric; if it's categorical, consider a classification approach (outside the scope of Fount EDA)
  • Parse date columns once with fount_eda.to_datetime and keep them consistent throughout your script
  • When comparing dataset variants, re-run EDA on each and compare suitability scores
  • Start with a simple baseline model, optimize only after the EDA results give you a clear direction