Samuel Garcia — Portfolio Roadmap

The Strategy

01Finance/Fintech is the smart entry point — 7yr domain edge is a real differentiator

02Target roles blending finance + data: quant analyst, fintech data analyst, financial data engineer, risk automation

03Sharpen DS/ML stack in parallel — build toward pure DS/ML roles over time

04Keep Finance stack sharp — it's a moat, not a liability

05Education projects prove domain depth in a format no pure-DS candidate can replicate

Contents

Tier 1 — Live Now: Macro-Alpha Engine · Fed-Watcher AI · Pocket Professor · Fraud Detection ML System · Algo Trading Gold Bot

Tier 2 — Build Next: Credit Risk Scoring API

Tier 2b — Education Projects: IFRS 9 Interactive Explorer · Fintech Unit Economics Dashboard

Tier 3 — Differentiation: P&L Automation Dashboard · Earnings Call NLP

Tier 4 — Long-Term / Pure DS: Anomaly Detection Engine · Forecasting SaaS MVP · LLM Finance Agent

Tier 1

Live now

Deployed, functional — focus on polish: real metrics, screenshots, clean READMEs, GitHub stars

Macro-Alpha Engine Live · AWS

End-to-end S&P 500 market direction forecasting system with ensemble ML and live dashboard

Flagship project
Highest visibility

What it does

Forecasts S&P 500 directional movement (up/down) 5 days ahead
Ensemble: XGBoost (macro fundamentals) + PyTorch LSTM (price momentum)
Unsupervised HMM regime detection (Bull / Transition / Bear)
40+ engineered features from FRED + Yahoo Finance APIs
Walk-forward time-series validation — no look-ahead bias
SHAP waterfall charts for every prediction (explainability)
Daily automated inference via GitHub Actions
50+ MLflow experiment runs tracked

Polish tasks remaining

Add real AUC, Sharpe, precision numbers to README
5 polished screenshots in docs/images/
Confirm AWS EC2 public URL is stable and linked from CV
Add architecture diagram to README
Record a 60-second demo video for website
Add What-If scenario lab (user changes macro inputs, prediction updates)

Tech stack

PythonXGBoostPyTorch HMMlearnSHAPMLflow DockerAWS EC2Streamlit GitHub ActionsFRED APIyfinance

Fed-Watcher AI Live · Streamlit

NLP pipeline on Federal Reserve communications — quantifies hawkish/dovish sentiment as time-series features

Strong NLP signal
Macro finance angle

What it does

Scrapes and parses FOMC statements, minutes, and speeches automatically
Applies LLM + sentiment scoring to classify tone (hawkish/dovish/neutral)
Converts text sentiment into numerical time-series features
Backtests correlation between Fed language shifts and rate moves
Live Streamlit dashboard with historical timeline
Dockerized and deployable to AWS

Polish tasks remaining

Add correlation chart: sentiment score vs actual Fed Funds rate
Add "next meeting" prediction module
Improve README with methodology section
Link live demo from CV and samvgarcia.com

Tech stack

PythonNLPLLMs PandasPlotlyStreamlit DockerAWSBeautifulSoup

Pocket Professor Live · PWA

AI-powered DS/ML interview prep tool — adaptive Q&A, flashcards, business cases, installable as PWA

Shows product thinking
Claude API integration

What it does

Adaptive flashcard deck covering DS/ML concepts
Live Q&A mode: Claude grades answers and gives feedback in real-time
Business case scenarios (fintech-themed)
Progress tracking with localStorage
Installable as Progressive Web App on mobile/desktop

Polish tasks remaining

Add finance-specific interview questions module
Add difficulty progression (easy → hard)
Track weak areas and resurface them
Add mock interview timed mode

Tech stack

Claude APIVanilla JS PWALocalStorageHTML/CSS

Fraud Detection ML System Live · AWS

Real-world fintech fraud classification system on IEEE-CIS dataset — cost-aware ML with analyst-facing dashboard

Critical fintech signal
Live in production

What it does

Classifies e-commerce transactions as fraud / legitimate (IEEE-CIS 2019 dataset, 500K+ rows)
Handles extreme class imbalance (0.3% fraud rate) — SMOTE + class weights
Ensemble: XGBoost (supervised) + Isolation Forest (unsupervised anomaly detection) + rule layer
Cost-aware threshold optimisation: minimise total cost (false neg = €500 fraud loss, false pos = €10 review cost)
SHAP per-transaction explanations ("flagged because: amount 5× user average AND new device AND temp email domain")
Fraud Analyst Terminal dashboard: upload batch CSV, see flagged transactions, drill into each with SHAP

Feature engineering highlights

Velocity features: transactions in last 1h / 3h / 24h per card
Behavioural anomalies: amount vs user's rolling average (ratio feature)
Device fingerprinting: new device for this user? new IP country?
Email risk: disposable / temp domain detection
Transaction graph: shared device across multiple cards

Tech stack

PythonXGBoostScikit-learn Isolation ForestSMOTE (imbalanced-learn) SHAPStreamlitDocker AWSPandasPlotly

Algo Trading Gold Bot Live · AWS

Systematic gold trading strategy engine with walk-forward backtesting, performance attribution, and live dashboard

Quant / hedge fund target
Live in production

What it does

Pluggable strategy framework: implement alpha signals as Python classes
Built-in signals: momentum, mean-reversion, yield-curve spread, macro factor tilt
Walk-forward backtesting engine with proper train/test splits
Position sizing: fixed fractional, Kelly, vol-targeting
Transaction cost modelling (bid-ask spread, market impact)
Performance report: Sharpe, Sortino, Calmar, max drawdown, win rate, expectancy
Streamlit dashboard with equity curve, drawdown chart, monthly returns heatmap

Key differentiators

Regime-aware: strategies auto-switch based on HMM regime detection (reuse from Macro-Alpha)
Macro-overlay: positions size down when macro conditions deteriorate
Forward-testing: paper trading mode via paper broker API

Tech stack

PythonBacktrader yfinancePandasNumPy PlotlyStreamlit

Tier 2

Build next — highest ROI

Credit Risk Scoring API + the education tools are the next priority. Completes the portfolio for fintech targeting.

Credit Risk Scoring API Build next

Production-grade credit scoring model wrapped as a REST API — the core engine behind every lending fintech

Strong banking signal
API/MLOps showcase

What it does

Predicts probability of default (PD) for loan applicants
Scorecard-style output (300–850 scale, explainable to non-technical stakeholders)
Logistic Regression baseline + XGBoost champion model with full comparison
FastAPI REST endpoint: POST /score → {pd_score, risk_band, top_factors}
Gini coefficient, KS statistic, PSI drift monitoring
Simple UI for manual testing (loan officer interface mockup)

Dataset options

Home Credit Default Risk (Kaggle) — large, well-documented
Lending Club (historical) — real US P2P loan data
German Credit (UCI) — small but clean, good for explainability focus

Tech stack

PythonXGBoostScikit-learn FastAPIPydanticDocker AWS EC2MLflowSHAP PandasStatsmodels

Tier 2b

Education / interactive explainer projects

The unique differentiator — finance domain knowledge made visual. No pure-DS candidate can build these.

Why these matter: These projects prove two things simultaneously — that you understand complex financial regulation deeply enough to teach it, and that you can build polished interactive tools. Every fintech interviewer will spend 10 minutes playing with these. They're conversation-starters, not just portfolio checkboxes.

IFRS 9 Interactive Explorer Education · Web App

Visual, interactive deep-dive into IFRS 9 — classification, ECL staging, hedge accounting, and accounting flows

Most unique project
in the entire portfolio

Background — what IFRS 9 is (and why it matters)

IFRS 9 (effective 2018) replaced IAS 39 — the global accounting standard for financial instruments at every bank, insurer, and large corporate
Three pillars: Classification & Measurement, Impairment (ECL model), Hedge Accounting
Every finance professional at a bank deals with IFRS 9 daily — but almost no one has built a visual explainer for it
Relevant to: banking, lending fintech, insurance, asset management, regulatory tech

Module 1 — Classification & Measurement tree

Interactive decision tree: pick a financial asset type
Answer 3 questions: Business Model test → SPPI test → Designation
Routes to one of 4 categories: Amortised Cost, FVOCI (debt), FVOCI (equity), FVTPL
Each category shows: where P&L goes, where OCI goes, impairment required?
Real examples per category (bond held to maturity, equity investment, trading book derivative)
T-account animation showing journal entries on initial recognition

Module 2 — ECL Staging model

Visual loan lifecycle: starts at Stage 1 (performing)
User drags "credit deterioration" slider (0–100%)
Loan migrates Stage 1 → 2 → 3 with trigger explanation
Provision amount jumps: Stage 1 = 12-month ECL, Stage 2/3 = lifetime ECL
Formula shown live: ECL = PD × LGD × EAD × Discount Factor
Toggle: forward-looking macroeconomic adjustments (GDP, unemployment scenarios)
Chart showing provision coverage ratio by stage

Module 3 — Hedge Accounting

Toggle between Fair Value Hedge vs Cash Flow Hedge
Animated diagram showing what goes to P&L vs OCI for each type
Example: interest rate swap hedging a fixed-rate bond (fair value hedge)
Example: FX forward hedging future USD revenue (cash flow hedge)
Effectiveness testing: show 80–125% corridor, in/out of range
What happens when hedge fails — ineffectiveness reclassified to P&L

Module 4 — Product lifecycle flows

Pick a product: IRS, FX Forward, Bond, Loan, Equity
Animated timeline: Inception → MTM → Settlement
Journal entries at each stage (Dr/Cr T-accounts)
Shows Economic P&L vs Accounting P&L gap (your actual job!)
Explains why Front Office and Back Office numbers differ

Tech stack

React or Vanilla JSD3.js (animations) CSS animationsGitHub Pages / Vercel No backend needed

Fintech Unit Economics Dashboard Education · Live Calculator

Live, interactive business model simulator for a fictional neobank / lending fintech — teaches the metrics every fintech interview tests

Interview prep tool
Shows business acumen

Core calculator — Tab 1: Unit economics

User inputs: CAC (€), Monthly churn (%), Avg loan size (€), Gross margin (%), Default rate (%)
Live outputs: LTV, LTV:CAC ratio, Payback period (months), Net Interest Margin
Traffic light: LTV:CAC <1 = red, 1–3 = amber, >3 = green
Chart: CAC payback curve showing cumulative revenue vs cost over time
Sensitivity: slider for churn rate — shows how dramatically LTV changes
Benchmark panel: typical neobank LTV:CAC ratios (Monzo, N26, Revolut context)

Tab 2 — Cohort revenue analysis

Cohort waterfall: 100 customers acquired in Month 0
Shows surviving customers each month (applying churn rate)
Revenue per cohort per month = survivors × avg monthly revenue
Cumulative revenue line crossing CAC line = payback period visualised
Toggle: with / without upsell (cross-sell products after Month 3)

Tab 3 — Payment economics

Card transaction flow: Merchant → Acquirer → Card Network → Issuer
Shows interchange fee split (who gets what % on a €100 transaction)
Toggle: credit vs debit vs prepaid — different fee structures
Shows why fintechs fight over interchange revenue
BIN sponsorship model explained visually

Tab 4 — P&L waterfall

Neobank P&L from revenue to EBITDA, step by step
User adjusts each line: NIM, Fee income, CAC, Opex, Loan losses
Waterfall chart updates live
Shows path to profitability — when does the unit break even?
Compare: conservative vs growth vs optimistic scenario

Tech stack

ReactRecharts or Chart.js Tailwind CSSVercel / GitHub Pages No backend — pure frontend

Tier 3

Differentiation layer

Build after Tier 2 is complete. Each sharpens a specific moat angle.

P&L Automation Dashboard Tier 3

Python-native replacement for Knime/PowerQuery reconciliation workflows — the project that directly mirrors your current job

Uniquely yours
No one else has this

What it does

Ingests synthetic trade data (mimics Front Office system exports — CSV/Excel)
Runs automated break detection: Economic P&L vs Accounting P&L discrepancies
Categorises breaks by type: PV mismatch, accrual timing, FX translation, unbooked trades
Generates reconciliation report: portfolio-level summary + trade-level drill-down
Streamlit dashboard with: break aging, resolution rate KPIs, trend charts
Shows 50% month-end time reduction (same as your CV bullet — but now demonstrated)

Why this is special

Story only you can tell — your 7yr domain edge made into a portfolio artifact
Replaces Knime (paid, proprietary) with open-source Python pipeline
Relevant to every bank, asset manager, custodian in the world
Demonstrates both Python skill AND financial product knowledge in one project

Tech stack

PythonPandasSQLite StreamlitPlotlyOpenPyXL GitHub Actions (scheduled run)

Earnings Call NLP Analyzer Tier 3

LLM-powered pipeline that processes earnings call transcripts and maps sentiment shifts to stock price reactions

Natural extension
of Fed-Watcher

What it does

Ingests earnings call transcripts (SEC EDGAR or scrape from Motley Fool / Seeking Alpha)
Extracts: management tone, guidance language, uncertainty signals
LLM summariser: "3 bullish signals, 2 risks, guidance change: up/flat/down"
Correlates sentiment score with next-day, next-week price reaction
Backtests simple signal: "buy if management tone improved QoQ"
Dashboard: search any S&P 500 ticker, see sentiment history + price overlay

NLP techniques used

FinBERT for financial sentiment (pre-trained)
Claude/GPT API for qualitative summary generation
Named entity recognition: executives, products, geographies
Topic modelling (LDA or BERTopic) across quarters
Uncertainty quantification: hedging language detection

Tech stack

PythonFinBERT HuggingFace TransformersClaude API StreamlitSEC EDGAR APIyfinance

Tier 4

Long-term / pure DS/ML pivot

Build when targeting pure DS/ML roles. Don't start these until Tier 2 is complete and you have interview momentum.

Anomaly Detection Engine Tier 4

Unsupervised anomaly detection on financial time-series using Autoencoders and LSTM-AE — generalises to any streaming data

What it does

Trains an LSTM Autoencoder on "normal" market behaviour
Flags anomalies when reconstruction error exceeds dynamic threshold
Detects: flash crashes, circuit breakers, liquidity gaps, unusual volumes
Streaming-ready: processes data tick by tick with rolling window
Alerts dashboard with anomaly scores and event timeline

Key concepts demonstrated

Unsupervised learning at scale
Sequence modelling with LSTMs
Dynamic thresholding (not fixed percentile)
Streaming ML (not just batch)
Signal detection under noise

Tech stack

PyTorchLSTM Autoencoder Kafka (optional streaming) StreamlitAWS

Forecasting SaaS MVP Tier 4

Multi-model time-series forecasting tool as a minimal SaaS — user uploads a CSV, gets Prophet / NeuralProphet / ARIMA forecasts with confidence intervals

What it does

User uploads any time-series CSV (sales, revenue, traffic, prices)
Runs 3 models automatically: Prophet, NeuralProphet, ARIMA
Returns: point forecast + 80%/95% confidence intervals
Model comparison table with AIC, MAPE, RMSE
Seasonality decomposition chart
Download forecast as CSV or PDF report
REST API endpoint for programmatic access

Why this is Tier 4

Requires solid understanding of all major forecasting paradigms
Product thinking: real users, real workflow
Backend + frontend + ML — full stack project
Can be monetised (real SaaS potential)

Tech stack

PythonProphetNeuralProphet Statsmodels (ARIMA)FastAPI ReactAWS Lambda

LLM Finance Agent Tier 4

RAG-powered financial analyst assistant — answers questions over financial filings, earnings reports, and market data using tool-calling LLM

What it does

Ingests: 10-K/10-Q filings, earnings transcripts, news, price data
RAG pipeline: embed documents → retrieve relevant chunks → LLM answers with citations
Tool use: agent can call live price API, calculate ratios, run a DCF model
Analyst-style output: "Based on Q3 earnings, here are the 3 key risks..."
Compare companies: "Is AAPL more capital-efficient than MSFT this year?"

Why this is Tier 4

Requires mastery of RAG, embeddings, vector stores, agent frameworks
Most ambitious project in the portfolio
Directly targets AI Engineer / LLM Ops roles

Tech stack

Claude API / OpenAILangChain or LlamaIndex ChromaDB or PineconeFastAPI SEC EDGARReactAWS

PortfolioRoadmap.

Portfolio
Roadmap.