The Engineer's Codex

01

System Structure

Architecture Patterns

The vocabulary of how to arrange code, services, and responsibilities. These are the patterns interviewers ask about and that production systems live or die by.

Structural Architectures

Hexagonal Architecture

Ports & Adapters · Alistair Cockburn 2005

The application core is surrounded by ports (interfaces) and adapters (implementations). Business logic has zero dependency on frameworks, databases, or UI. The most production-proven architecture for testable, maintainable services.

[ HTTP Adapter ] [ Kafka Adapter ] ↓ ↓ ┌─────────────────────────────┐ │ PORT: PaymentInputPort │ │ ┌───────────────────────┐ │ │ │ DOMAIN / USE CASES │ │ │ └───────────────────────┘ │ │ PORT: PaymentRepoPort │ └─────────────────────────────┘ ↓ ↓ [ JPA Adapter ] [ InMemory Adapter ]

Primary PortsSecondary PortsTestable Core

Clean Architecture

Uncle Bob · Onion Variant

Concentric dependency rings: Entities → Use Cases → Interface Adapters → Frameworks. Dependency Rule: source code dependencies must point inward only. The domain never imports Spring, JPA, or anything external.

┌────────────────────────────────┐ │ Frameworks & Drivers (outer) │ │ ┌──────────────────────────┐ │ │ │ Interface Adapters │ │ │ │ ┌────────────────────┐ │ │ │ │ │ Use Cases │ │ │ │ │ │ ┌──────────────┐ │ │ │ │ │ │ │ Entities │ │ │ │ │ │ │ └──────────────┘ │ │ │ │ │ └────────────────────┘ │ │ │ └──────────────────────────┘ │ └────────────────────────────────┘

Dependency RuleFramework-Free Core

CQRS

Command Query Responsibility Segregation

Separate the write model (commands that change state) from the read model (queries that return data). Enables independent scaling of reads/writes, optimized read models (denormalized projections), and eliminates impedance mismatch between domain objects and query DTOs.

Command Side Query Side ───────────── ────────────── CreatePayment ─┐ PaymentSummary UpdateStatus ─┤─→ PaymentDetail RefundPayment ─┘ DashboardView │ ↑ Write Store → Read Store (projection)

CommandsQueriesRead Scaling

Event Sourcing

Store Events, Not State · Greg Young

The database stores the sequence of events that led to current state, not the current state itself. Current state is derived by replaying events. Natural audit log, time travel debugging, event replay to rebuild projections. Pairs naturally with CQRS. Complex; don't use without genuine need.

Event Store (append-only): ┌────────────────────────────────────┐ │ 1. PaymentCreated {id, amount} │ │ 2. PaymentValidated {riskScore} │ │ 3. PaymentCharged {txRef} │ │ 4. PaymentSettled {timestamp} │ └────────────────────────────────────┘ ↓ replay ↓ Current State = fold(events, initialState)

Append-OnlyReplayAudit TrailComplexity Cost

Saga Pattern

Distributed Transactions Without 2PC

A sequence of local transactions, each publishing events that trigger the next. If a step fails, compensating transactions undo completed steps. Two flavors: Choreography (services react to events) and Orchestration (a saga orchestrator directs steps). Fintech critical — payment spans multiple services.

ORCHESTRATION SAGA (explicit controller): SagaOrchestrator → PaymentService.reserve() ✓ → RiskService.check() ✓ → LedgerService.debit() ✗ ← FAILS ← LedgerService (compensate) ← PaymentService.release() ← rollback CHOREOGRAPHY SAGA (event-driven): PaymentCreated →→ RiskChecked →→ LedgerDebited ↓ if fails LedgerFailedEvt →→ PaymentReleased

Compensating TxnEventual Consistency

Strangler Fig

Legacy Migration · Martin Fowler

Gradually replace a legacy monolith by wrapping it with new services. Route specific endpoints to new microservices while the monolith handles the rest. The new system "strangles" the old one over time until nothing routes to the monolith. The only responsible way to containerize legacy fintech apps.

Phase 1: All → [Monolith] Phase 2: /payments/* → [New PaymentSvc] /users/* → [Monolith] Phase 3: /payments/* → [New PaymentSvc] /accounts/* → [New AccountSvc] /reports/* → [Monolith] (shrinking) Phase N: Monolith = empty → decommission

IncrementalRisk-MitigatedJob Relevant

Sidecar Pattern

Service Mesh · Envoy · Istio

Deploy a helper container alongside the main service container in the same pod. The sidecar handles cross-cutting concerns: mTLS, observability, rate limiting, circuit breaking. Your app code stays simple; the mesh handles reliability. Envoy proxy is the canonical sidecar. Critical for zero-trust fintech networking.

┌─── Pod ─────────────────────────────┐ │ ┌──────────────┐ ┌─────────────┐ │ │ │ Your App │ │ Envoy │ │ │ │ (Spring │←→│ Sidecar │ │ │ │ Boot) │ │ mTLS │ │ │ │ │ │ Metrics │ │ │ │ │ │ Tracing │ │ │ └──────────────┘ └─────────────┘ │ └─────────────────────────────────────┘

Cross-CuttingmTLSKubernetes Native

BFF Pattern

Backend For Frontend

Create a dedicated backend API for each frontend type (mobile app, web SPA, partner API). Each BFF aggregates and transforms data specifically for its client, avoiding the "too much / too little" problem of a generic API. Common in fintech: different data needs for mobile banking vs web dashboard vs partner integrations.

[Mobile App] → [Mobile BFF] ─┐ [Web SPA] → [Web BFF] ─┤→ [Microservices] [Partner API]→ [Partner BFF] ─┘ Each BFF: tailored aggregation, auth, rate limits

Client-SpecificAPI Tailoring

💡 Architecture Interview Move When asked "how would you design a payment system?", the correct answer layers these: Hexagonal architecture for each service (testable domain), CQRS to separate payment write commands from reporting queries, Saga for multi-service payment flows, Strangler Fig if modernizing legacy, and Sidecar for the service mesh. Name-dropping one pattern shows reading; composing all five shows experience.

02

Eric Evans · Vaughn Vernon

Domain-Driven Design

DDD is not a framework — it's a language for having precise conversations about complex business domains. In fintech, the domain IS the competitive advantage.

Strategic Design — The Big Picture

Strategic Pattern

Bounded Context

A semantic boundary within which a domain model is consistent and unambiguous. "Account" means different things in Banking (balance, transactions) vs Identity (user credentials). Each bounded context owns its model — no cross-context JPA join. Maps 1:1 to microservices in well-designed systems.

Strategic Pattern

Context Map

A diagram showing how bounded contexts relate: Upstream/Downstream, Customer/Supplier, Partnership, Shared Kernel, Conformist, Anti-Corruption Layer (ACL). The ACL is critical when integrating legacy systems — translate their model into yours at the boundary, don't let their concepts pollute your domain.

Strategic Pattern

Ubiquitous Language

A shared vocabulary between developers and domain experts, reflected exactly in code. If the business says "authorization" and "capture" for card payments, your code has authorize() and capture() — not processStep1() and processStep2(). The language evolves the model.

Tactical Design — The Building Blocks

Building Block	Identity?	Mutable?	Definition	Java Implementation
Entity	✓ Yes	✓ Yes	An object defined by its identity, not its attributes. Two Payments with same amount but different IDs are different entities. Identity persists through state changes.	`@Entity class Payment { private PaymentId id; ... }` — identity via equals/hashCode on ID only
Value Object	✗ No	✗ No	Defined entirely by its attributes. Two `Money(100, USD)` objects are equal and interchangeable. Immutable. No identity. Replaces primitives: use `Money`, not `double`.	Java `record Money(BigDecimal amount, Currency currency) {}` — records are perfect VOs
Aggregate	✓ (root)	✓ (via root)	A cluster of entities and VOs with a single root entity. All access to the cluster goes through the Aggregate Root. Enforces invariants at the boundary. Transactional consistency unit.	`Order` aggregate root owns `OrderLine` entities — never access OrderLine directly from outside Order
Repository	—	—	Abstracts the persistence of aggregates. Returns aggregate roots only. The interface lives in the domain layer; the JPA implementation lives in the infrastructure layer. This is the Hexagonal port/adapter boundary.	`interface PaymentRepo { Payment findById(PaymentId); void save(Payment); }`
Domain Service	—	—	Domain logic that doesn't naturally fit one entity or VO. Cross-aggregate coordination. Stateless. If you find yourself putting logic in a utility class, it's probably a Domain Service.	`class TransferService { void transfer(Account from, Account to, Money amount) }`
Domain Event	✓ (implicit)	✗ No	Something that happened in the domain that other parts of the system care about. Immutable facts. Drives sagas, projections, notifications. Named in past tense: `PaymentAuthorized`, not `AuthorizePayment`.	`record PaymentAuthorized(PaymentId id, Instant at, Money amount) implements DomainEvent {}`

// DDD Aggregate in Java 21 — the full idiom
public class Payment {  // Aggregate Root
    private final PaymentId   id;          // typed identity, not String
    private final Money       amount;      // Value Object, not BigDecimal
    private       PaymentStatus status;
    private final List<DomainEvent> events = new ArrayList<>();

    // Business methods express domain language, not CRUD
    public void authorize(RiskAssessment risk) {
        if (status != PaymentStatus.PENDING)
            throw new InvalidPaymentStateException(id, status, "authorize");
        if (risk.isHighRisk() && amount.exceeds(Money.of(10000, USD)))
            throw new RiskLimitExceededException(id, risk);
        
        this.status = PaymentStatus.AUTHORIZED;
        events.add(new PaymentAuthorized(id, Instant.now(), amount));
    }

    // Invariant: amount must be positive — enforced in constructor/factory
    public static Payment create(Money amount, Merchant merchant) {
        if (!amount.isPositive()) throw new IllegalArgumentException("Amount must be positive");
        var p = new Payment(PaymentId.generate(), amount, merchant);
        p.events.add(new PaymentCreated(p.id, amount, merchant.id()));
        return p;
    }
}

💡 DDD in Fintech — The Killer Application Fintech has the most DDD-amenable domain on earth. "Authorization," "capture," "settlement," "chargeback," "reconciliation" — these are precise, legally-defined terms that every business stakeholder uses exactly. If your code uses those same words, bugs become immediately visible to non-engineers. That's DDD's core promise.

03

Fallacies · Laws · Theorems

Distributed Systems

The 8 fallacies of distributed computing are all lies you've believed at some point. These are the theorems and patterns that let you build honest, reliable systems anyway.

The Fundamental Laws

CAP Theorem — Brewer 2000

A distributed system can guarantee at most two of:
Consistency — every read receives the most recent write
Availability — every request receives a response
Partition Tolerance — system continues despite network splits

Since network partitions are unavoidable in production, you're really choosing between CP (consistent, possibly unavailable during partition) or AP (always available, possibly stale).

Postgres = CP. Cassandra (default) = AP. DynamoDB = AP. ZooKeeper = CP.

PACELC — Abadi 2012

CAP only addresses partition scenarios. PACELC extends it:

If Partition: choose between Availability or Consistency
Else (no partition): choose between Latency or Consistency

The latency-consistency tradeoff is the everyday reality that CAP misses. Synchronous replication = consistent but slower. Async replication = faster but eventual.

DynamoDB: PA/EL (high availability, low latency, eventual)
Spanner: PC/EC (consistent, higher latency, consistent)

Consistency Models (Weakest → Strongest)

Model	Guarantee	Examples	Latency Cost
Eventual	All replicas converge to same value eventually — no time bound	DNS, Cassandra (default), S3	Lowest
Monotonic Read	You never see older values than you've already read (within session)	DynamoDB sessions, Cosmos DB	Low
Read Your Writes	After a write, you always see it in subsequent reads	Most sticky-session databases	Low-Medium
Causal	Operations that are causally related appear in causal order	MongoDB causal sessions, CockroachDB	Medium
Sequential	Operations appear in some total order consistent with program order	Kafka partition ordering	Medium-High
Linearizable	Every operation appears to take effect atomically at some instant — real-time ordering guaranteed	Zookeeper, etcd, Google Spanner, CockroachDB strong reads	High
Strict Serializable	Transactions appear to execute serially in real-time order — the gold standard	Google Spanner, CockroachDB transactions	Highest

Key Distributed Patterns

Outbox Pattern

Transactional Outbox · At-Least-Once Delivery

Write to your database AND an outbox table in the same transaction. A separate process reads the outbox and publishes to Kafka/SQS. Guarantees events are published if-and-only-if the DB write commits. Solves the dual-write problem. Essential in fintech event-driven systems.

BEGIN TRANSACTION INSERT payments (...) ← domain write INSERT outbox (event, status=PENDING) ← same txn COMMIT Poller reads outbox → publishes to Kafka → marks outbox row PUBLISHED

Idempotency Keys

Safe Retries in Payment Systems

Every mutating API request carries a client-generated idempotency key. Server stores (key → result). If the same key arrives again (retry/timeout), return the stored result without re-executing. Stripe's entire API works this way. In fintech, idempotency is not optional — double charges are lawsuit-level bugs.

POST /payments Idempotency-Key: a4f7-3b2c-9d01 ← client generates UUID Server: key exists? return cached result Server: key new? execute → store result → return

Circuit Breaker

Resilience4j · Cascading Failure Prevention

Three states: CLOSED (normal), OPEN (failing fast), HALF-OPEN (testing recovery). When failures exceed threshold, open the circuit — stop calling the failing service and return fallback immediately. Prevents one slow service from taking down all callers. Resilience4j is the Java standard.

CLOSED → (failure rate > 50%) → OPEN OPEN → (wait 60s) → HALF-OPEN HALF-OPEN → (test call ok) → CLOSED HALF-OPEN → (test call fails) → OPEN

Two-Phase Commit

2PC · Why We Use Sagas Instead

Distributed atomic transaction across multiple services: Phase 1 (Prepare — all participants vote commit/abort), Phase 2 (Commit — coordinator sends final decision). Works but: coordinator SPOF, blocking during failure, poor scalability. In modern microservices, Sagas with compensations are preferred. Know 2PC to explain why you're not using it.

Coordinator → all: PREPARE? All → Coordinator: READY ✓ / ABORT ✗ Coordinator → all: COMMIT (if all READY) Problem: coordinator crashes after PREPARE → all participants blocked waiting forever

Fallacies of Distributed Computing — Peter Deutsch 1994

1. The network is reliable | 2. Latency is zero | 3. Bandwidth is infinite | 4. The network is secure
5. Topology doesn't change | 6. There is one administrator | 7. Transport cost is zero | 8. The network is homogeneous

Every junior engineer assumes at least three of these. Every production incident traces back to at least one. Design systems assuming the opposite of all eight.

04

The New Discipline

AI Engineering Patterns

Building systems with LLMs is a new engineering discipline with its own failure modes, patterns, and architecture primitives. This is the core of your new role.

Retrieval Augmented Generation (RAG)

NAIVE RAG PIPELINE: User Query ↓ embed(query) Query Vector →→→ Vector DB (pgvector, Pinecone, Qdrant) ↓ similarity search (cosine/dot) Top-K Chunks (relevant document fragments) ↓ inject into prompt LLM Prompt: "Context: {chunks} Question: {query} Answer:" ↓ generate Grounded Response ← cites retrieved context, not hallucination ADVANCED RAG: Query Rewriting → HyDE → Hybrid Search (BM25 + vector) → Re-ranking → Filtered retrieval

Agent Architectures

ReAct Pattern

Reason + Act · Yao et al. 2022

The LLM alternates between Thought (reasoning about what to do), Action (calling a tool), and Observation (processing the tool result). Continues until it can answer the original question. The foundation of most production agents.

Thought: I need to check the payment status Action: get_payment_status("PAY-123") Observation: {status: "pending", created: "2h ago"} Thought: It's been pending 2h, I should escalate Action: escalate_payment("PAY-123", "timeout") Observation: {escalated: true, ticket: "ESC-456"} Answer: Payment escalated, ticket ESC-456 created.

Tool Use / Function Calling

LLM → Structured API Calls

Provide the LLM with a schema of available tools (function names, parameter types, descriptions). The model decides when to call which tool and with what arguments. Your Spring Boot service becomes an LLM tool — the AI calls your APIs. This is how AI-enhanced CI/CD pipelines work.

// Tool definition (LangChain4j)
@Tool("Get payment status by ID")
PaymentStatus getPaymentStatus(
    @P("The payment ID") String paymentId) {
  return paymentService.findById(paymentId).status();
}

Chain-of-Thought Prompting

CoT · Intermediate Reasoning Steps

Instruct the model to show its reasoning before answering. "Think step by step" dramatically improves accuracy on multi-step problems. Zero-shot CoT: just add "Let's think step by step." Few-shot CoT: provide examples of reasoning chains. Critical for complex code generation and security analysis.

Prompt Template "You are a Java security expert. Analyze this code for OWASP Top 10 vulnerabilities. Think through each category step by step, then provide your findings."

Guardrails Pattern

Input/Output Validation for LLMs

In regulated fintech environments, LLM inputs and outputs must be validated. Input guardrails: check for PII, inject system context, rate limit. Output guardrails: validate JSON schema, check for harmful content, strip sensitive data before logging. Never log raw prompts containing customer financial data.

User Input → [PII Scrubber] → [Input Guard] → LLM ↓ Response ← [PII Re-inject] ← [Output Guard] ← Raw Output Audit log: sanitized_prompt, model, latency NEVER log: raw_prompt with customer PII

Structured Output

JSON Mode · Constrained Generation

Force the LLM to output valid JSON conforming to a schema. Pydantic (Python) or Jackson annotations (Java) define the expected structure; the framework enforces it. Essential when LLM output drives code execution — freeform text cannot drive state machine transitions.

// LangChain4j structured output
record RiskAssessment(
    RiskLevel level,
    String reason,
    boolean requiresManualReview) {}

RiskAssessment assessment = aiService
    .assessRisk(transaction);  // returns typed record

Prompt Caching

Cost & Latency Optimization

Long system prompts (compliance rules, code context, documentation) can be cached by the model provider. Subsequent requests reuse the cached prefix — dramatically reducing tokens processed and therefore latency and cost. Anthropic, OpenAI, Google Gemini all support prompt caching. In fintech, your compliance ruleset can be a cached system prompt.

Request 1: [Long System Prompt 8k tokens] + [Query] → Full processing: $0.024 + $0.001 Request 2: [cache_control: ephemeral] + [New Query] → Cache hit: $0.003 + $0.001 (87% cheaper)

AI Evaluation — The Missing Discipline

Eval Method	What It Measures	When to Use
Unit Evals	Does a single prompt produce expected output on known examples? Pass/fail per case.	Regression testing after prompt changes
LLM-as-Judge	Use a stronger model to grade outputs: faithfulness, relevance, coherence, groundedness	RAG pipeline quality, continuous monitoring
Human Eval	Domain experts rate a sample of outputs. Gold standard but expensive.	Pre-production sign-off, periodic audits
A/B Testing	Route % of traffic to new prompt/model, compare downstream metrics	Prompt optimization, model upgrades
Hallucination Detection	Check if claims in output are supported by retrieved context (for RAG)	Always, for RAG over compliance documents
Latency P99	99th percentile response time under load — LLM APIs have high variance	Production readiness, SLA definition

💡 The One AI Pattern Nobody Teaches Treat your AI pipelines exactly like distributed services: circuit break on API timeout (LLMs drop connections), retry with exponential backoff (provider rate limits), cache aggressively (same query twice = waste), stream responses (don't block on 3-second generations), and have a deterministic fallback when the LLM is degraded. Every production AI system eventually needs all five.

05

The Three Pillars + SRE

Observability & SRE

If you can't measure it, you can't improve it. In fintech, if you can't prove it in logs, it didn't happen legally. Observability is compliance infrastructure.

Pillar 1

Metrics

What: Numeric time-series data aggregated over windows. Tools: Micrometer (Spring Boot) → Prometheus → Grafana. Types: Counter (requests_total), Gauge (queue_depth), Histogram (request_duration_p99), Summary. Java: @Timed, @Counted annotations on service methods.

Pillar 2

Logs

What: Structured, timestamped event records. Tools: SLF4J + Logback → JSON format → ELK or CloudWatch Logs. Critical: structured JSON (not plain text), correlation IDs on every log line, PII masking before persistence. Logs are audit evidence in fintech — retention policy is a compliance requirement.

Pillar 3

Traces

What: The path of a single request through distributed services. Tools: OpenTelemetry (OTEL) → Jaeger or Zipkin or Tempo. Java: Spring Boot 3 + Micrometer Tracing auto-instruments. Each trace = spans with parent-child relationships. Mandatory for debugging microservice latency.

SRE Reliability Mathematics

SLI → SLO → SLA

SLI (Service Level Indicator): the actual measurement
e.g., % of payment API requests completing in <200ms

SLO (Service Level Objective): the internal target
e.g., 99.9% of requests complete in <200ms, measured monthly

SLA (Service Level Agreement): external contractual commitment
e.g., 99.5% availability or service credits apply

SLO is stricter than SLA — the buffer is your error budget. Always set SLO tighter than SLA.

Error Budget

Error Budget = 1 - SLO
At 99.9% SLO: budget = 0.1% = 43.8 minutes/month downtime allowed
At 99.99% SLO: budget = 0.01% = 4.38 minutes/month

Error budget creates alignment: if you're within budget, deploy freely. If you've burned it, freeze deployments and focus on reliability. Engineering velocity is a function of reliability.

Availability	Downtime/Month	Downtime/Year
99%	7.2 hours	3.65 days
99.5%	3.6 hours	1.83 days
99.9%	43.8 min	8.77 hours
99.95%	21.9 min	4.38 hours
99.99%	4.38 min	52.6 min
99.999%	26.3 sec	5.26 min

Fintech Baseline Payment processing APIs typically target 99.99% (4 min/month). Settlement and reporting can tolerate 99.9%. Real-time fraud scoring must be 99.999% — it blocks transactions.

Key Metrics Every Backend Engineer Must Know

Metric	Measures	Tool (Spring Boot)	Alert Threshold Example
http_server_requests_seconds_p99	99th percentile API latency	Micrometer auto	Alert if P99 > 500ms for 5min
http_server_requests_total{status=5xx}	Server error rate	Micrometer auto	Alert if error rate > 0.1%
jvm_memory_used_bytes	Heap/non-heap usage	Micrometer JVM metrics	Alert if heap > 85% for 10min
jvm_gc_pause_seconds	GC pause duration	Micrometer JVM metrics	Alert if GC pause > 200ms
hikaricp_connections_pending	DB connection pool saturation	Micrometer HikariCP	Alert if pending > 5 for 2min
kafka_consumer_lag	Message processing backlog	Kafka exporter	Alert if lag growing for 15min
resilience4j_cb_state	Circuit breaker state	Resilience4j metrics	Alert on OPEN state immediately

06

REST · gRPC · GraphQL · AsyncAPI

API Design Patterns

The contract between services is the most expensive thing to change. Get it right the first time. Or at least version it.

REST Richardson Maturity Model

Level	Name	What It Means	Example
Level 0	The Swamp of POX	Single URI, HTTP as transport only, no semantics	`POST /api {"action":"getPayment","id":"123"}`
Level 1	Resources	Multiple URIs representing resources	`POST /payments/123` (but all POST)
Level 2	HTTP Verbs	Use GET/POST/PUT/PATCH/DELETE semantically	`GET /payments/123` — this is "REST" in most orgs
Level 3	Hypermedia (HATEOAS)	Responses include links to valid next actions	Response includes `_links: {capture: {href:...}, refund: {href:...}}`

gRPC vs REST vs GraphQL — Decision Matrix

Dimension	REST/JSON	gRPC/Protobuf	GraphQL
Transport	HTTP/1.1 or 2	HTTP/2 only, bidirectional streaming	HTTP/1.1 or 2
Schema	OpenAPI (optional)	Protobuf IDL (required, strict)	SDL type system (required)
Performance	Baseline	2–10x faster (binary, no serialization overhead)	Similar to REST, query complexity risk
Browser Support	✓ Native	✗ Needs grpc-web proxy	✓ Native
Best For	External APIs, CRUD services, public APIs	Internal microservice communication, streaming, mobile	Complex, flexible client data requirements (dashboards)
Versioning	URL /v1/, /v2/ or headers	Protobuf field numbers (backward compatible)	Field deprecation, additive-only
Fintech	External partner APIs, webhooks	Internal payment processing pipelines	Finance dashboards, reporting queries

API Versioning Strategies

// URL versioning — simple, cache-friendly
GET /v1/payments/123
GET /v2/payments/123

// Header versioning — clean URLs
GET /payments/123
Accept: application/vnd.myapi.v2+json

// Consumer-Driven Contract Testing (Pact)
// The consumer defines the contract;
// the provider verifies it on every build
@Pact(consumer = "payment-service")
public RequestResponsePact createPact(PactDslWithProvider builder) {
    return builder
        .given("payment 123 exists")
        .uponReceiving("a request for payment 123")
        .path("/v1/payments/123")
        .method("GET")
        .willRespondWith()
        .status(200)
        .body("{\"id\":\"123\",\"amount\":100.00}")
        .toPact();
}

🏦 Fintech API Non-Negotiables 1. Idempotency-Key header on all POST mutations
2. Correlation-ID header on all requests (for audit trails)
3. ISO 4217 currency codes (never "dollars")
4. ISO 8601 timestamps in UTC (never local time)
5. Decimal amounts as strings (never float — IEEE 754 is not your friend with money)
6. Error responses with machine-readable error codes (not just HTTP status)

REST API Design Rule Nouns, not verbs. POST /payments not POST /createPayment. State transitions via sub-resources: POST /payments/123/captures to capture, POST /payments/123/refunds to refund. HTTP verb encodes the action; the resource noun encodes the target.

07

Zero Trust · OAuth2 · STRIDE · PCI

Security Patterns

In fintech, security is not a feature. It is the product. Every engineering decision is also a security decision.

OAuth2 / OIDC — The Full Flow

AUTHORIZATION CODE FLOW WITH PKCE (the secure one for web/mobile): 1. User clicks Login 2. Client generates code_verifier (random), code_challenge = SHA256(verifier) 3. Client redirects: GET /authorize?response_type=code&client_id=X& code_challenge=Y&code_challenge_method=S256 4. Auth Server: user authenticates → issues authorization_code 5. Auth Server redirects back: ?code=ABC 6. Client: POST /token {code=ABC, code_verifier=original} 7. Auth Server: validates code + verifier match → returns access_token, refresh_token, id_token 8. Client: GET /api/payments Authorization: Bearer {access_token} 9. Resource Server: validates JWT signature, expiry, scopes → serves request PKCE prevents authorization code interception attacks — mandatory for SPAs and mobile

JWT Deep Dive

// JWT = Header.Payload.Signature (base64url encoded, dot-separated)
// Spring Security 6 JWT Resource Server
@Configuration
@EnableWebSecurity
class SecurityConfig {
    @Bean
    SecurityFilterChain chain(HttpSecurity http) throws Exception {
        return http
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/actuator/health").permitAll()
                .requestMatchers(HttpMethod.POST, "/payments")
                    .hasAuthority("SCOPE_payments:write")   // scope-based
                .requestMatchers(HttpMethod.GET, "/payments/**")
                    .hasAuthority("SCOPE_payments:read")
                .anyRequest().authenticated())
            .oauth2ResourceServer(o -> o
                .jwt(j -> j.jwtAuthenticationConverter(jwtConverter())))
            .sessionManagement(s -> s
                .sessionCreationPolicy(STATELESS))   // no session cookies
            .build();
    }
}

STRIDE Threat Model

Threat	What It Is	Java/Spring Mitigation	OWASP Category
S — Spoofing	Claiming to be someone else (identity)	JWT validation, mTLS between services, Spring Security	Broken Authentication
T — Tampering	Modifying data in transit or at rest	TLS 1.3, HMAC request signing, field-level encryption for PII	Cryptographic Failures
R — Repudiation	Denying an action occurred	Immutable audit logs, digital signatures, event sourcing	Insufficient Logging
I — Information Disclosure	Exposing private data	PII masking in logs, response filtering, no stack traces in prod responses	Sensitive Data Exposure
D — Denial of Service	Making the system unavailable	Rate limiting (Resilience4j), circuit breakers, CDN, WAF	Security Misconfiguration
E — Elevation of Privilege	Gaining unauthorized permissions	Least privilege IAM roles, RBAC with Spring Security, scoped JWT	Broken Access Control

⚠ PCI-DSS Scope — The Fintech Engineer's Burden PCI-DSS applies to any system that stores, processes, or transmits cardholder data (CHD). Scope reduction is the key strategy: tokenize card numbers at the edge (Stripe, Braintree, Spreedly), process only tokens internally — your services never see raw PANs. This shrinks your PCI scope from "entire infrastructure" to "the tokenization boundary." Know this. It will come up in your first sprint.

Zero Trust Architecture

Zero Trust Model

Never Trust, Always Verify · BeyondCorp · NIST 800-207

Traditional perimeter security: "inside the network = trusted." Zero Trust: every request must be authenticated and authorized regardless of source. No implicit trust from network location. Applies to: service-to-service calls (mTLS), user access (MFA + device posture), data access (attribute-based authorization).

OLD MODEL: [Firewall] → [Internal Network = TRUSTED] ↓ Any internal service can call any other service ZERO TRUST: Every call authenticated (mTLS certificates) Every call authorized (JWT scopes checked) Every call logged (audit trail) Minimal blast radius: service A can only call service B if B explicitly permits A's service account

08

Kafka · CDC · Polyglot · Data Mesh

Data Architecture

Data is the liability that never goes away. How you store, stream, and govern it determines what you can and can't build later.

Kafka — The Fintech Backbone

KAFKA ARCHITECTURE: Producer (Payment Service) ↓ publish to topic "payment.events" ┌────────────────────────────────────────────────────────┐ │ Topic: payment.events (partitioned, replicated) │ │ Partition 0: [P1][P2][P5][P8]... │ │ Partition 1: [P3][P4][P6]... │ ← ordered within partition │ Partition 2: [P7][P9]... │ └────────────────────────────────────────────────────────┘ ↓ consume (independent consumer groups, no data loss) [Ledger Service (group=ledger)] [Risk Service (group=risk)] [Audit Service (group=audit)] Key invariant: messages with same key always go to same partition → ordering guarantee per entity e.g., key=paymentId → all events for payment 123 always on same partition, always in order

Change Data Capture (CDC)

CDC with Debezium

Database → Kafka → Everything Else

Debezium reads Postgres/MySQL binary logs (WAL) and streams every INSERT/UPDATE/DELETE as a Kafka event. Your application writes only to Postgres; all downstream consumers (search index, analytics, cache invalidation, audit) receive the changes via Kafka. No dual-write. No polling. The outbox pattern's industrial-strength brother.

Postgres (payments table) ↓ WAL (Write-Ahead Log) reader Debezium Connector ↓ streams to Kafka topic: "dbserver.public.payments" ↓ [Elasticsearch Consumer] → search index updated [Analytics Consumer] → data warehouse updated [Cache Consumer] → Redis invalidated [Notification Consumer] → customer email sent

Polyglot Persistence — The Right Database for Each Job

Data Store	Model	Best For in Fintech	Java Integration
PostgreSQL	Relational + ACID	Accounts, transactions, ledger entries — anything requiring ACID guarantees and complex queries. The default choice.	Spring Data JPA, JDBC, jOOQ
Redis	Key-Value / Cache	Session tokens, rate limiting counters, idempotency key cache, frequently-read balances. Sub-millisecond reads.	Spring Data Redis, Lettuce, Redisson
Kafka	Log / Event Stream	Event streaming, audit log, CDC, async communication between services. Not a database — retention is finite.	Spring Kafka, Kafka Streams, Schema Registry
Elasticsearch	Document / Search	Transaction search, compliance reporting, log aggregation. NOT for primary data — use as a derived read model fed by CDC.	Spring Data Elasticsearch, High-Level REST Client
DynamoDB / Cassandra	Wide-Column / NoSQL	High-write-throughput time-series: fraud signals, click events, pricing feeds. When you know the access patterns ahead of time.	AWS SDK, Spring Data Cassandra
S3 / GCS	Object Storage	Audit log archives, compliance exports, ML training data, batch reports. Not queryable (use Athena/BigQuery on top).	AWS SDK v2, Spring Cloud AWS

💡 The Golden Rule of Fintech Data Double-entry bookkeeping is the 700-year-old constraint that still governs all financial systems. Every debit has a corresponding credit. The sum of all ledger entries must always equal zero. Any system that touches money must maintain this invariant — no exceptions, no "we'll reconcile later." If your service moves money, it must implement a ledger, not just a balance field.

09

Domain Knowledge · Payment Flows · Compliance

Fintech Specifics

The knowledge that separates engineers who happen to work at a fintech from engineers who understand the domain. This is your moat.

Card Payment Lifecycle — State Machine

INITIATED

→ validate →

AUTHORIZED

→ capture →

CAPTURED

→ settle →

SETTLED

AUTHORIZED

→ void →

VOIDED

// pre-capture cancellation

SETTLED

→ refund →

REFUNDING

→ complete →

REFUNDED

SETTLED

→ customer disputes →

CHARGEBACK

→ evidence submitted →

CHARGEBACK_REVIEW

INITIATED

→ any failure →

DECLINED

// risk, insufficient funds, expired card, etc.

🏦 Key Distinction: Authorization vs Capture Authorization = reserving funds on the cardholder's account (instant). Capture = actually taking the funds (usually next business day). Hotels authorize on check-in, capture on checkout. E-commerce authorizes on order, captures on shipment. Your code must handle the gap between the two — a customer's card can be canceled between auth and capture.

Double-Entry Bookkeeping in Code

// Every financial movement = two ledger entries that sum to zero
record LedgerEntry(
    UUID       id,
    UUID       transactionId,
    AccountId accountId,
    Money      amount,       // positive = debit, negative = credit
    EntryType type,         // DEBIT or CREDIT
    Instant   postedAt
) {}

// Transfer $100 from Customer A to Customer B:
var entries = List.of(
    new LedgerEntry(uuid(), txId, customerA, Money.of(100), DEBIT,  now()),
    new LedgerEntry(uuid(), txId, customerB, Money.of(100), CREDIT, now())
);

// INVARIANT: sum of all entries for a transaction = 0
// DEBIT 100 + CREDIT -100 = 0  ✓
assert entries.stream()
    .map(e -> e.type() == DEBIT ? e.amount() : e.amount().negate())
    .reduce(Money.ZERO, Money::add)
    .equals(Money.ZERO);  // Must always pass

Compliance Landscape Cheat Sheet

Standard/Regulation	Applies To	Key Requirement for Developers
PCI-DSS v4	Any system touching payment card data	Tokenize PANs, encrypt CHD at rest/transit, vulnerability scanning, penetration testing, strict access control, detailed audit logs
GDPR / CCPA	Any system with EU/CA personal data	Right to erasure (pseudonymize don't delete for audit), data minimization, purpose limitation, breach notification <72hrs, data subject request APIs
SOC 2 Type II	SaaS fintech platforms (trust)	Evidenced security controls: access reviews, change management logs, incident response records, availability monitoring — your observability IS the evidence
AML / BSA	Banks, money transmitters	Transaction monitoring, suspicious activity reports (SARs), customer due diligence data retention, OFAC sanctions screening API
FFIEC	US banks and fintech partners	IT risk management, authentication standards, vendor risk management. Affects architecture decisions — multi-cloud, disaster recovery, BCP.

10

Interview Preparation · Back of Envelope · Common Patterns

System Design Reference

The interview format where your 30 years shows most clearly. Know the numbers, know the patterns, know when to ask questions.

Back-of-Envelope Numbers Every Engineer Must Know

Operation	Latency
L1 cache reference	0.5 ns
L2 cache reference	7 ns
Main memory (RAM) reference	100 ns
SSD read (NVMe)	100 µs
HDD seek	10 ms
Same datacenter round trip	0.5 ms
US cross-country round trip	40 ms
US to Europe round trip	150 ms
Postgres query (indexed)	1–5 ms
Redis GET	0.1–0.5 ms
LLM API call (Claude Sonnet)	500ms–3s

Scale Number	Value
Requests/sec, single server	1,000–10,000 RPS
Postgres transactions/sec	1,000–10,000 TPS
Kafka throughput/partition	100 MB/s
S3 PUT throughput	3,500 req/s per prefix
Redis ops/sec	100,000+
Twitter: peak tweets/sec	~150,000
Visa: transactions/sec	~24,000
Bytes per day: 1B users, 10 actions	~1 TB/day
Encoding 1 minute video (720p)	~60 MB
Characters per token (LLM)	~4 chars

System Design Framework — The Interview Structure

Step	Duration	What to Do	Key Questions
1. Clarify Requirements	5 min	Ask before designing. Scope aggressively. Don't assume.	Read vs write heavy? DAU? Latency SLA? Consistency required? Global or single region?
2. Capacity Estimation	3 min	Back-of-envelope: storage, bandwidth, QPS. Show your math.	100M users × 10 payments/month = 1B payments/month = ~400 TPS avg, ~4000 TPS peak
3. High-Level Design	10 min	Draw the boxes: API gateway, services, databases, caches, queues.	Show data flow end-to-end. Name the components. Don't skip the client.
4. Deep Dive	15 min	Pick the hardest parts (consistency, scale, fault tolerance). Show tradeoffs.	How do you handle duplicate payments? What happens when DB goes down? How do you scale reads?
5. Identify Bottlenecks	5 min	Proactively surface weak points. Show you know the limits of your design.	Single DB is a bottleneck → read replicas → sharding → when each applies

💡 Design a Payment System — The Complete Answer API Gateway (auth, rate limit, idempotency check) → Payment Service (Spring Boot, Hexagonal) → Outbox write (Postgres, same transaction) → Kafka event → downstream consumers: Risk Service (sync, pre-authorization), Ledger Service (async), Notification Service (async). State machine enforces valid transitions. Circuit breakers on all external calls (card networks). Idempotency keys prevent double-charge. Stripe-style tokenization keeps CHD out of scope. That's a complete answer.

11

The Frameworks Behind the Frameworks

Mental Models

The patterns that don't belong to any language or framework but that senior engineers apply to every problem. These are the invisible curriculum.

Leaky Abstractions

Joel Spolsky's Law · "All abstractions leak"

Every abstraction eventually exposes the details it was meant to hide. TCP hides network unreliability — until packet loss forces retransmit and you see the latency spike. JPA hides SQL — until N+1 queries destroy your performance. The better you understand what's underneath, the better you use what's on top. A greybeard's advantage: you've seen more leaks.

Mechanical Sympathy

Martin Thompson · Hardware-Aware Software

Code that understands the hardware it runs on outperforms code that ignores it. CPU cache lines (64 bytes), memory access patterns (spatial/temporal locality), NUMA topology, false sharing on atomic variables. Your JVM profiling instincts should include: "is this cache-miss heavy?" before "is the algorithm wrong?"

The Two Hard Problems

Cache Invalidation · Naming · Off-By-One

Caching makes reads fast but creates consistency problems. The hard question is never "should we cache?" but "when does this cache entry become invalid and how do we propagate that?" Cache-aside, write-through, write-behind, TTL, event-driven invalidation — each a different consistency tradeoff. Know them all before you cache anything in fintech.

Conway's Law

Melvin Conway 1967 · Org Structure = System Structure

Organizations design systems that mirror their communication structures. If three teams build a compiler, you get a three-stage compiler. The Inverse Conway Maneuver: design the team structure you want, and the architecture will follow. Microservices work when team ownership matches service boundaries. When it doesn't, you get distributed monoliths.

Hyrum's Law

"All observable behaviors become contracts"

With enough users of an API, it doesn't matter what you specify in the contract — all observable behaviors will be depended on by someone. The order of items in a JSON array. The exact text of an error message. The response time. Someone depends on it. Fintech implication: API changes that seem harmless break clients in ways you never anticipated. Version everything.

Simple vs Easy

Rich Hickey · "Simple Made Easy" 2011

"Easy" means close to you — familiar, low ceremony. "Simple" means not complex — not entangled with other things. Spring Boot is easy (familiar, quick to start) but not necessarily simple (the magic intertwines many concerns). Plain JDBC is harder (more ceremony) but simpler (explicit). Choose simple over easy for things that matter long-term. Easy for throwaway scripts.

You Ain't Gonna Need It

YAGNI · XP Principle · Resist Premature Generalization

Don't build features or abstractions before they're needed. The cost of wrong abstraction is much higher than the cost of refactoring simple code. In AI-assisted development this is critical: AI generates generic, over-engineered code by default. Your job is to prune it back to what's actually required. Less code = fewer bugs = less to containerize = less to secure.

The Pit of Success

Rico Mariani · API Design Philosophy

Design your APIs, frameworks, and systems so that the easiest path leads to the correct, safe, well-performing outcome. The opposite of a "pit of failure" (where easy = wrong). Spring Security's new lambda DSL is a pit of success — the most natural way to write it is the secure way. Your custom APIs should do the same: make the safe path the natural path.

The Senior Engineer's Decision Framework

When You Hear...	First Question to Ask	The Trap to Avoid
"We need microservices"	What coordination problem are you solving? What's the team ownership model?	Distributed monolith — services that need to deploy together and share a DB
"We need event sourcing"	Do you genuinely need audit history, time travel, or multiple projections? Or do you want an audit log?	Event sourcing for simple CRUD — the complexity cost is enormous
"We need to use AI for this"	What's the deterministic solution? Is it inadequate? Why?	Using LLMs where regex, a lookup table, or a trained classifier is 10x better
"We should cache this"	What's the invalidation strategy? What's the consistency SLA?	Stale cache in fintech = customer sees wrong balance = support tickets = regulatory risk
"This needs to be async"	What's the consumer? What's the failure model? Who owns the retry?	Fire-and-forget messaging where the consumer can silently fail
"Let's rewrite this legacy service"	What specific problem does a rewrite solve that incremental improvement doesn't?	Second-system syndrome — the rewrite accumulates all the complexity the original had, plus new ones
"The AI can handle that"	What are the failure modes? What's the fallback? How do we eval quality?	Deploying LLM-driven logic without evaluation framework = non-deterministic production bug

💡 The Final Insight — What Greybeards Actually Have Junior engineers learn patterns. Mid-level engineers apply them. Senior engineers know when NOT to apply them. After 30 years, you've seen every pattern implemented wrong, every abstraction leak, every "we'll fix it later" that never got fixed. That's not nostalgia — that's a production incident database in your head. The art is translating that database into precise, actionable insight at the right moment in a design discussion. That's what "senior" means.

TheEngineer'sCodex

Architecture Patterns

Domain-Driven Design

Bounded Context

Context Map

Ubiquitous Language

Distributed Systems

AI Engineering Patterns

Observability & SRE

Metrics

Logs

Traces

API Design Patterns

Security Patterns

Data Architecture

Fintech Specifics

System Design Reference

Mental Models

The
Engineer's
Codex