The Strategy
01Finance/Fintech is the smart entry point — 7yr domain edge is a real differentiator
02Target roles blending finance + data: quant analyst, fintech data analyst, financial data engineer, risk automation
03Sharpen DS/ML stack in parallel — build toward pure DS/ML roles over time
04Keep Finance stack sharp — it's a moat, not a liability
05Education projects prove domain depth in a format no pure-DS candidate can replicate
Contents
Tier 1 — Live Now: Macro-Alpha Engine · Fed-Watcher AI · Pocket Professor · Fraud Detection ML System · Algo Trading Gold Bot
Tier 2 — Build Next: Credit Risk Scoring API
Tier 2b — Education Projects: IFRS 9 Interactive Explorer · Fintech Unit Economics Dashboard
Tier 3 — Differentiation: P&L Automation Dashboard · Earnings Call NLP
Tier 4 — Long-Term / Pure DS: Anomaly Detection Engine · Forecasting SaaS MVP · LLM Finance Agent
Tier 1
Live now
Deployed, functional — focus on polish: real metrics, screenshots, clean READMEs, GitHub stars
Macro-Alpha Engine Live · AWS

End-to-end S&P 500 market direction forecasting system with ensemble ML and live dashboard

Flagship project
Highest visibility
What it does
  • Forecasts S&P 500 directional movement (up/down) 5 days ahead
  • Ensemble: XGBoost (macro fundamentals) + PyTorch LSTM (price momentum)
  • Unsupervised HMM regime detection (Bull / Transition / Bear)
  • 40+ engineered features from FRED + Yahoo Finance APIs
  • Walk-forward time-series validation — no look-ahead bias
  • SHAP waterfall charts for every prediction (explainability)
  • Daily automated inference via GitHub Actions
  • 50+ MLflow experiment runs tracked
Polish tasks remaining
  • Add real AUC, Sharpe, precision numbers to README
  • 5 polished screenshots in docs/images/
  • Confirm AWS EC2 public URL is stable and linked from CV
  • Add architecture diagram to README
  • Record a 60-second demo video for website
  • Add What-If scenario lab (user changes macro inputs, prediction updates)
Tech stack
PythonXGBoostPyTorch HMMlearnSHAPMLflow DockerAWS EC2Streamlit GitHub ActionsFRED APIyfinance
Fed-Watcher AI Live · Streamlit

NLP pipeline on Federal Reserve communications — quantifies hawkish/dovish sentiment as time-series features

Strong NLP signal
Macro finance angle
What it does
  • Scrapes and parses FOMC statements, minutes, and speeches automatically
  • Applies LLM + sentiment scoring to classify tone (hawkish/dovish/neutral)
  • Converts text sentiment into numerical time-series features
  • Backtests correlation between Fed language shifts and rate moves
  • Live Streamlit dashboard with historical timeline
  • Dockerized and deployable to AWS
Polish tasks remaining
  • Add correlation chart: sentiment score vs actual Fed Funds rate
  • Add "next meeting" prediction module
  • Improve README with methodology section
  • Link live demo from CV and samvgarcia.com
Tech stack
PythonNLPLLMs PandasPlotlyStreamlit DockerAWSBeautifulSoup
Pocket Professor Live · PWA

AI-powered DS/ML interview prep tool — adaptive Q&A, flashcards, business cases, installable as PWA

Shows product thinking
Claude API integration
What it does
  • Adaptive flashcard deck covering DS/ML concepts
  • Live Q&A mode: Claude grades answers and gives feedback in real-time
  • Business case scenarios (fintech-themed)
  • Progress tracking with localStorage
  • Installable as Progressive Web App on mobile/desktop
Polish tasks remaining
  • Add finance-specific interview questions module
  • Add difficulty progression (easy → hard)
  • Track weak areas and resurface them
  • Add mock interview timed mode
Tech stack
Claude APIVanilla JS PWALocalStorageHTML/CSS
Fraud Detection ML System Live · AWS

Real-world fintech fraud classification system on IEEE-CIS dataset — cost-aware ML with analyst-facing dashboard

Critical fintech signal
Live in production
What it does
  • Classifies e-commerce transactions as fraud / legitimate (IEEE-CIS 2019 dataset, 500K+ rows)
  • Handles extreme class imbalance (0.3% fraud rate) — SMOTE + class weights
  • Ensemble: XGBoost (supervised) + Isolation Forest (unsupervised anomaly detection) + rule layer
  • Cost-aware threshold optimisation: minimise total cost (false neg = €500 fraud loss, false pos = €10 review cost)
  • SHAP per-transaction explanations ("flagged because: amount 5× user average AND new device AND temp email domain")
  • Fraud Analyst Terminal dashboard: upload batch CSV, see flagged transactions, drill into each with SHAP
Feature engineering highlights
  • Velocity features: transactions in last 1h / 3h / 24h per card
  • Behavioural anomalies: amount vs user's rolling average (ratio feature)
  • Device fingerprinting: new device for this user? new IP country?
  • Email risk: disposable / temp domain detection
  • Transaction graph: shared device across multiple cards
Tech stack
PythonXGBoostScikit-learn Isolation ForestSMOTE (imbalanced-learn) SHAPStreamlitDocker AWSPandasPlotly
Algo Trading Gold Bot Live · AWS

Systematic gold trading strategy engine with walk-forward backtesting, performance attribution, and live dashboard

Quant / hedge fund target
Live in production
What it does
  • Pluggable strategy framework: implement alpha signals as Python classes
  • Built-in signals: momentum, mean-reversion, yield-curve spread, macro factor tilt
  • Walk-forward backtesting engine with proper train/test splits
  • Position sizing: fixed fractional, Kelly, vol-targeting
  • Transaction cost modelling (bid-ask spread, market impact)
  • Performance report: Sharpe, Sortino, Calmar, max drawdown, win rate, expectancy
  • Streamlit dashboard with equity curve, drawdown chart, monthly returns heatmap
Key differentiators
  • Regime-aware: strategies auto-switch based on HMM regime detection (reuse from Macro-Alpha)
  • Macro-overlay: positions size down when macro conditions deteriorate
  • Forward-testing: paper trading mode via paper broker API
Tech stack
PythonBacktrader yfinancePandasNumPy PlotlyStreamlit
Tier 2
Build next — highest ROI
Credit Risk Scoring API + the education tools are the next priority. Completes the portfolio for fintech targeting.
Credit Risk Scoring API Build next

Production-grade credit scoring model wrapped as a REST API — the core engine behind every lending fintech

Strong banking signal
API/MLOps showcase
What it does
  • Predicts probability of default (PD) for loan applicants
  • Scorecard-style output (300–850 scale, explainable to non-technical stakeholders)
  • Logistic Regression baseline + XGBoost champion model with full comparison
  • FastAPI REST endpoint: POST /score → {pd_score, risk_band, top_factors}
  • Gini coefficient, KS statistic, PSI drift monitoring
  • Simple UI for manual testing (loan officer interface mockup)
Dataset options
  • Home Credit Default Risk (Kaggle) — large, well-documented
  • Lending Club (historical) — real US P2P loan data
  • German Credit (UCI) — small but clean, good for explainability focus
Tech stack
PythonXGBoostScikit-learn FastAPIPydanticDocker AWS EC2MLflowSHAP PandasStatsmodels
Tier 2b
Education / interactive explainer projects
The unique differentiator — finance domain knowledge made visual. No pure-DS candidate can build these.

Why these matter: These projects prove two things simultaneously — that you understand complex financial regulation deeply enough to teach it, and that you can build polished interactive tools. Every fintech interviewer will spend 10 minutes playing with these. They're conversation-starters, not just portfolio checkboxes.

IFRS 9 Interactive Explorer Education · Web App

Visual, interactive deep-dive into IFRS 9 — classification, ECL staging, hedge accounting, and accounting flows

Most unique project
in the entire portfolio
Background — what IFRS 9 is (and why it matters)
  • IFRS 9 (effective 2018) replaced IAS 39 — the global accounting standard for financial instruments at every bank, insurer, and large corporate
  • Three pillars: Classification & Measurement, Impairment (ECL model), Hedge Accounting
  • Every finance professional at a bank deals with IFRS 9 daily — but almost no one has built a visual explainer for it
  • Relevant to: banking, lending fintech, insurance, asset management, regulatory tech
Module 1 — Classification & Measurement tree
  • Interactive decision tree: pick a financial asset type
  • Answer 3 questions: Business Model test → SPPI test → Designation
  • Routes to one of 4 categories: Amortised Cost, FVOCI (debt), FVOCI (equity), FVTPL
  • Each category shows: where P&L goes, where OCI goes, impairment required?
  • Real examples per category (bond held to maturity, equity investment, trading book derivative)
  • T-account animation showing journal entries on initial recognition
Module 2 — ECL Staging model
  • Visual loan lifecycle: starts at Stage 1 (performing)
  • User drags "credit deterioration" slider (0–100%)
  • Loan migrates Stage 1 → 2 → 3 with trigger explanation
  • Provision amount jumps: Stage 1 = 12-month ECL, Stage 2/3 = lifetime ECL
  • Formula shown live: ECL = PD × LGD × EAD × Discount Factor
  • Toggle: forward-looking macroeconomic adjustments (GDP, unemployment scenarios)
  • Chart showing provision coverage ratio by stage
Module 3 — Hedge Accounting
  • Toggle between Fair Value Hedge vs Cash Flow Hedge
  • Animated diagram showing what goes to P&L vs OCI for each type
  • Example: interest rate swap hedging a fixed-rate bond (fair value hedge)
  • Example: FX forward hedging future USD revenue (cash flow hedge)
  • Effectiveness testing: show 80–125% corridor, in/out of range
  • What happens when hedge fails — ineffectiveness reclassified to P&L
Module 4 — Product lifecycle flows
  • Pick a product: IRS, FX Forward, Bond, Loan, Equity
  • Animated timeline: Inception → MTM → Settlement
  • Journal entries at each stage (Dr/Cr T-accounts)
  • Shows Economic P&L vs Accounting P&L gap (your actual job!)
  • Explains why Front Office and Back Office numbers differ
Tech stack
React or Vanilla JSD3.js (animations) CSS animationsGitHub Pages / Vercel No backend needed
Fintech Unit Economics Dashboard Education · Live Calculator

Live, interactive business model simulator for a fictional neobank / lending fintech — teaches the metrics every fintech interview tests

Interview prep tool
Shows business acumen
Core calculator — Tab 1: Unit economics
  • User inputs: CAC (€), Monthly churn (%), Avg loan size (€), Gross margin (%), Default rate (%)
  • Live outputs: LTV, LTV:CAC ratio, Payback period (months), Net Interest Margin
  • Traffic light: LTV:CAC <1 = red, 1–3 = amber, >3 = green
  • Chart: CAC payback curve showing cumulative revenue vs cost over time
  • Sensitivity: slider for churn rate — shows how dramatically LTV changes
  • Benchmark panel: typical neobank LTV:CAC ratios (Monzo, N26, Revolut context)
Tab 2 — Cohort revenue analysis
  • Cohort waterfall: 100 customers acquired in Month 0
  • Shows surviving customers each month (applying churn rate)
  • Revenue per cohort per month = survivors × avg monthly revenue
  • Cumulative revenue line crossing CAC line = payback period visualised
  • Toggle: with / without upsell (cross-sell products after Month 3)
Tab 3 — Payment economics
  • Card transaction flow: Merchant → Acquirer → Card Network → Issuer
  • Shows interchange fee split (who gets what % on a €100 transaction)
  • Toggle: credit vs debit vs prepaid — different fee structures
  • Shows why fintechs fight over interchange revenue
  • BIN sponsorship model explained visually
Tab 4 — P&L waterfall
  • Neobank P&L from revenue to EBITDA, step by step
  • User adjusts each line: NIM, Fee income, CAC, Opex, Loan losses
  • Waterfall chart updates live
  • Shows path to profitability — when does the unit break even?
  • Compare: conservative vs growth vs optimistic scenario
Tech stack
ReactRecharts or Chart.js Tailwind CSSVercel / GitHub Pages No backend — pure frontend
Tier 3
Differentiation layer
Build after Tier 2 is complete. Each sharpens a specific moat angle.
P&L Automation Dashboard Tier 3

Python-native replacement for Knime/PowerQuery reconciliation workflows — the project that directly mirrors your current job

Uniquely yours
No one else has this
What it does
  • Ingests synthetic trade data (mimics Front Office system exports — CSV/Excel)
  • Runs automated break detection: Economic P&L vs Accounting P&L discrepancies
  • Categorises breaks by type: PV mismatch, accrual timing, FX translation, unbooked trades
  • Generates reconciliation report: portfolio-level summary + trade-level drill-down
  • Streamlit dashboard with: break aging, resolution rate KPIs, trend charts
  • Shows 50% month-end time reduction (same as your CV bullet — but now demonstrated)
Why this is special
  • Story only you can tell — your 7yr domain edge made into a portfolio artifact
  • Replaces Knime (paid, proprietary) with open-source Python pipeline
  • Relevant to every bank, asset manager, custodian in the world
  • Demonstrates both Python skill AND financial product knowledge in one project
Tech stack
PythonPandasSQLite StreamlitPlotlyOpenPyXL GitHub Actions (scheduled run)
Earnings Call NLP Analyzer Tier 3

LLM-powered pipeline that processes earnings call transcripts and maps sentiment shifts to stock price reactions

Natural extension
of Fed-Watcher
What it does
  • Ingests earnings call transcripts (SEC EDGAR or scrape from Motley Fool / Seeking Alpha)
  • Extracts: management tone, guidance language, uncertainty signals
  • LLM summariser: "3 bullish signals, 2 risks, guidance change: up/flat/down"
  • Correlates sentiment score with next-day, next-week price reaction
  • Backtests simple signal: "buy if management tone improved QoQ"
  • Dashboard: search any S&P 500 ticker, see sentiment history + price overlay
NLP techniques used
  • FinBERT for financial sentiment (pre-trained)
  • Claude/GPT API for qualitative summary generation
  • Named entity recognition: executives, products, geographies
  • Topic modelling (LDA or BERTopic) across quarters
  • Uncertainty quantification: hedging language detection
Tech stack
PythonFinBERT HuggingFace TransformersClaude API StreamlitSEC EDGAR APIyfinance
Tier 4
Long-term / pure DS/ML pivot
Build when targeting pure DS/ML roles. Don't start these until Tier 2 is complete and you have interview momentum.
Anomaly Detection Engine Tier 4

Unsupervised anomaly detection on financial time-series using Autoencoders and LSTM-AE — generalises to any streaming data

What it does
  • Trains an LSTM Autoencoder on "normal" market behaviour
  • Flags anomalies when reconstruction error exceeds dynamic threshold
  • Detects: flash crashes, circuit breakers, liquidity gaps, unusual volumes
  • Streaming-ready: processes data tick by tick with rolling window
  • Alerts dashboard with anomaly scores and event timeline
Key concepts demonstrated
  • Unsupervised learning at scale
  • Sequence modelling with LSTMs
  • Dynamic thresholding (not fixed percentile)
  • Streaming ML (not just batch)
  • Signal detection under noise
Tech stack
PyTorchLSTM Autoencoder Kafka (optional streaming) StreamlitAWS
Forecasting SaaS MVP Tier 4

Multi-model time-series forecasting tool as a minimal SaaS — user uploads a CSV, gets Prophet / NeuralProphet / ARIMA forecasts with confidence intervals

What it does
  • User uploads any time-series CSV (sales, revenue, traffic, prices)
  • Runs 3 models automatically: Prophet, NeuralProphet, ARIMA
  • Returns: point forecast + 80%/95% confidence intervals
  • Model comparison table with AIC, MAPE, RMSE
  • Seasonality decomposition chart
  • Download forecast as CSV or PDF report
  • REST API endpoint for programmatic access
Why this is Tier 4
  • Requires solid understanding of all major forecasting paradigms
  • Product thinking: real users, real workflow
  • Backend + frontend + ML — full stack project
  • Can be monetised (real SaaS potential)
Tech stack
PythonProphetNeuralProphet Statsmodels (ARIMA)FastAPI ReactAWS Lambda
LLM Finance Agent Tier 4

RAG-powered financial analyst assistant — answers questions over financial filings, earnings reports, and market data using tool-calling LLM

What it does
  • Ingests: 10-K/10-Q filings, earnings transcripts, news, price data
  • RAG pipeline: embed documents → retrieve relevant chunks → LLM answers with citations
  • Tool use: agent can call live price API, calculate ratios, run a DCF model
  • Analyst-style output: "Based on Q3 earnings, here are the 3 key risks..."
  • Compare companies: "Is AAPL more capital-efficient than MSFT this year?"
Why this is Tier 4
  • Requires mastery of RAG, embeddings, vector stores, agent frameworks
  • Most ambitious project in the portfolio
  • Directly targets AI Engineer / LLM Ops roles
Tech stack
Claude API / OpenAILangChain or LlamaIndex ChromaDB or PineconeFastAPI SEC EDGARReactAWS