The final layer. Everything that surrounds the code: how systems are structured, how distributed failures behave, how AI applications are architected, how fintech operations actually work, and the mental models that separate engineers from senior engineers.
A semantic boundary within which a domain model is consistent and unambiguous. "Account" means different things in Banking (balance, transactions) vs Identity (user credentials). Each bounded context owns its model — no cross-context JPA join. Maps 1:1 to microservices in well-designed systems.
A diagram showing how bounded contexts relate: Upstream/Downstream, Customer/Supplier, Partnership, Shared Kernel, Conformist, Anti-Corruption Layer (ACL). The ACL is critical when integrating legacy systems — translate their model into yours at the boundary, don't let their concepts pollute your domain.
A shared vocabulary between developers and domain experts, reflected exactly in code. If the business says "authorization" and "capture" for card payments, your code has authorize() and capture() — not processStep1() and processStep2(). The language evolves the model.
| Building Block | Identity? | Mutable? | Definition | Java Implementation |
|---|---|---|---|---|
| Entity | ✓ Yes | ✓ Yes | An object defined by its identity, not its attributes. Two Payments with same amount but different IDs are different entities. Identity persists through state changes. | @Entity class Payment { private PaymentId id; ... } — identity via equals/hashCode on ID only |
| Value Object | ✗ No | ✗ No | Defined entirely by its attributes. Two Money(100, USD) objects are equal and interchangeable. Immutable. No identity. Replaces primitives: use Money, not double. |
Java record Money(BigDecimal amount, Currency currency) {} — records are perfect VOs |
| Aggregate | ✓ (root) | ✓ (via root) | A cluster of entities and VOs with a single root entity. All access to the cluster goes through the Aggregate Root. Enforces invariants at the boundary. Transactional consistency unit. | Order aggregate root owns OrderLine entities — never access OrderLine directly from outside Order |
| Repository | — | — | Abstracts the persistence of aggregates. Returns aggregate roots only. The interface lives in the domain layer; the JPA implementation lives in the infrastructure layer. This is the Hexagonal port/adapter boundary. | interface PaymentRepo { Payment findById(PaymentId); void save(Payment); } |
| Domain Service | — | — | Domain logic that doesn't naturally fit one entity or VO. Cross-aggregate coordination. Stateless. If you find yourself putting logic in a utility class, it's probably a Domain Service. | class TransferService { void transfer(Account from, Account to, Money amount) } |
| Domain Event | ✓ (implicit) | ✗ No | Something that happened in the domain that other parts of the system care about. Immutable facts. Drives sagas, projections, notifications. Named in past tense: PaymentAuthorized, not AuthorizePayment. |
record PaymentAuthorized(PaymentId id, Instant at, Money amount) implements DomainEvent {} |
// DDD Aggregate in Java 21 — the full idiom public class Payment { // Aggregate Root private final PaymentId id; // typed identity, not String private final Money amount; // Value Object, not BigDecimal private PaymentStatus status; private final List<DomainEvent> events = new ArrayList<>(); // Business methods express domain language, not CRUD public void authorize(RiskAssessment risk) { if (status != PaymentStatus.PENDING) throw new InvalidPaymentStateException(id, status, "authorize"); if (risk.isHighRisk() && amount.exceeds(Money.of(10000, USD))) throw new RiskLimitExceededException(id, risk); this.status = PaymentStatus.AUTHORIZED; events.add(new PaymentAuthorized(id, Instant.now(), amount)); } // Invariant: amount must be positive — enforced in constructor/factory public static Payment create(Money amount, Merchant merchant) { if (!amount.isPositive()) throw new IllegalArgumentException("Amount must be positive"); var p = new Payment(PaymentId.generate(), amount, merchant); p.events.add(new PaymentCreated(p.id, amount, merchant.id())); return p; } }
| Model | Guarantee | Examples | Latency Cost |
|---|---|---|---|
| Eventual | All replicas converge to same value eventually — no time bound | DNS, Cassandra (default), S3 | Lowest |
| Monotonic Read | You never see older values than you've already read (within session) | DynamoDB sessions, Cosmos DB | Low |
| Read Your Writes | After a write, you always see it in subsequent reads | Most sticky-session databases | Low-Medium |
| Causal | Operations that are causally related appear in causal order | MongoDB causal sessions, CockroachDB | Medium |
| Sequential | Operations appear in some total order consistent with program order | Kafka partition ordering | Medium-High |
| Linearizable | Every operation appears to take effect atomically at some instant — real-time ordering guaranteed | Zookeeper, etcd, Google Spanner, CockroachDB strong reads | High |
| Strict Serializable | Transactions appear to execute serially in real-time order — the gold standard | Google Spanner, CockroachDB transactions | Highest |
// Tool definition (LangChain4j) @Tool("Get payment status by ID") PaymentStatus getPaymentStatus( @P("The payment ID") String paymentId) { return paymentService.findById(paymentId).status(); }
// LangChain4j structured output record RiskAssessment( RiskLevel level, String reason, boolean requiresManualReview) {} RiskAssessment assessment = aiService .assessRisk(transaction); // returns typed record
| Eval Method | What It Measures | When to Use |
|---|---|---|
| Unit Evals | Does a single prompt produce expected output on known examples? Pass/fail per case. | Regression testing after prompt changes |
| LLM-as-Judge | Use a stronger model to grade outputs: faithfulness, relevance, coherence, groundedness | RAG pipeline quality, continuous monitoring |
| Human Eval | Domain experts rate a sample of outputs. Gold standard but expensive. | Pre-production sign-off, periodic audits |
| A/B Testing | Route % of traffic to new prompt/model, compare downstream metrics | Prompt optimization, model upgrades |
| Hallucination Detection | Check if claims in output are supported by retrieved context (for RAG) | Always, for RAG over compliance documents |
| Latency P99 | 99th percentile response time under load — LLM APIs have high variance | Production readiness, SLA definition |
What: Numeric time-series data aggregated over windows. Tools: Micrometer (Spring Boot) → Prometheus → Grafana. Types: Counter (requests_total), Gauge (queue_depth), Histogram (request_duration_p99), Summary. Java: @Timed, @Counted annotations on service methods.
What: Structured, timestamped event records. Tools: SLF4J + Logback → JSON format → ELK or CloudWatch Logs. Critical: structured JSON (not plain text), correlation IDs on every log line, PII masking before persistence. Logs are audit evidence in fintech — retention policy is a compliance requirement.
What: The path of a single request through distributed services. Tools: OpenTelemetry (OTEL) → Jaeger or Zipkin or Tempo. Java: Spring Boot 3 + Micrometer Tracing auto-instruments. Each trace = spans with parent-child relationships. Mandatory for debugging microservice latency.
| Availability | Downtime/Month | Downtime/Year |
|---|---|---|
| 99% | 7.2 hours | 3.65 days |
| 99.5% | 3.6 hours | 1.83 days |
| 99.9% | 43.8 min | 8.77 hours |
| 99.95% | 21.9 min | 4.38 hours |
| 99.99% | 4.38 min | 52.6 min |
| 99.999% | 26.3 sec | 5.26 min |
| Metric | Measures | Tool (Spring Boot) | Alert Threshold Example |
|---|---|---|---|
| http_server_requests_seconds_p99 | 99th percentile API latency | Micrometer auto | Alert if P99 > 500ms for 5min |
| http_server_requests_total{status=5xx} | Server error rate | Micrometer auto | Alert if error rate > 0.1% |
| jvm_memory_used_bytes | Heap/non-heap usage | Micrometer JVM metrics | Alert if heap > 85% for 10min |
| jvm_gc_pause_seconds | GC pause duration | Micrometer JVM metrics | Alert if GC pause > 200ms |
| hikaricp_connections_pending | DB connection pool saturation | Micrometer HikariCP | Alert if pending > 5 for 2min |
| kafka_consumer_lag | Message processing backlog | Kafka exporter | Alert if lag growing for 15min |
| resilience4j_cb_state | Circuit breaker state | Resilience4j metrics | Alert on OPEN state immediately |
| Level | Name | What It Means | Example |
|---|---|---|---|
| Level 0 | The Swamp of POX | Single URI, HTTP as transport only, no semantics | POST /api {"action":"getPayment","id":"123"} |
| Level 1 | Resources | Multiple URIs representing resources | POST /payments/123 (but all POST) |
| Level 2 | HTTP Verbs | Use GET/POST/PUT/PATCH/DELETE semantically | GET /payments/123 — this is "REST" in most orgs |
| Level 3 | Hypermedia (HATEOAS) | Responses include links to valid next actions | Response includes _links: {capture: {href:...}, refund: {href:...}} |
| Dimension | REST/JSON | gRPC/Protobuf | GraphQL |
|---|---|---|---|
| Transport | HTTP/1.1 or 2 | HTTP/2 only, bidirectional streaming | HTTP/1.1 or 2 |
| Schema | OpenAPI (optional) | Protobuf IDL (required, strict) | SDL type system (required) |
| Performance | Baseline | 2–10x faster (binary, no serialization overhead) | Similar to REST, query complexity risk |
| Browser Support | ✓ Native | ✗ Needs grpc-web proxy | ✓ Native |
| Best For | External APIs, CRUD services, public APIs | Internal microservice communication, streaming, mobile | Complex, flexible client data requirements (dashboards) |
| Versioning | URL /v1/, /v2/ or headers | Protobuf field numbers (backward compatible) | Field deprecation, additive-only |
| Fintech | External partner APIs, webhooks | Internal payment processing pipelines | Finance dashboards, reporting queries |
// URL versioning — simple, cache-friendly GET /v1/payments/123 GET /v2/payments/123 // Header versioning — clean URLs GET /payments/123 Accept: application/vnd.myapi.v2+json // Consumer-Driven Contract Testing (Pact) // The consumer defines the contract; // the provider verifies it on every build @Pact(consumer = "payment-service") public RequestResponsePact createPact(PactDslWithProvider builder) { return builder .given("payment 123 exists") .uponReceiving("a request for payment 123") .path("/v1/payments/123") .method("GET") .willRespondWith() .status(200) .body("{\"id\":\"123\",\"amount\":100.00}") .toPact(); }
POST /payments not POST /createPayment. State transitions via sub-resources: POST /payments/123/captures to capture, POST /payments/123/refunds to refund. HTTP verb encodes the action; the resource noun encodes the target.
// JWT = Header.Payload.Signature (base64url encoded, dot-separated) // Spring Security 6 JWT Resource Server @Configuration @EnableWebSecurity class SecurityConfig { @Bean SecurityFilterChain chain(HttpSecurity http) throws Exception { return http .authorizeHttpRequests(auth -> auth .requestMatchers("/actuator/health").permitAll() .requestMatchers(HttpMethod.POST, "/payments") .hasAuthority("SCOPE_payments:write") // scope-based .requestMatchers(HttpMethod.GET, "/payments/**") .hasAuthority("SCOPE_payments:read") .anyRequest().authenticated()) .oauth2ResourceServer(o -> o .jwt(j -> j.jwtAuthenticationConverter(jwtConverter()))) .sessionManagement(s -> s .sessionCreationPolicy(STATELESS)) // no session cookies .build(); } }
| Threat | What It Is | Java/Spring Mitigation | OWASP Category |
|---|---|---|---|
| S — Spoofing | Claiming to be someone else (identity) | JWT validation, mTLS between services, Spring Security | Broken Authentication |
| T — Tampering | Modifying data in transit or at rest | TLS 1.3, HMAC request signing, field-level encryption for PII | Cryptographic Failures |
| R — Repudiation | Denying an action occurred | Immutable audit logs, digital signatures, event sourcing | Insufficient Logging |
| I — Information Disclosure | Exposing private data | PII masking in logs, response filtering, no stack traces in prod responses | Sensitive Data Exposure |
| D — Denial of Service | Making the system unavailable | Rate limiting (Resilience4j), circuit breakers, CDN, WAF | Security Misconfiguration |
| E — Elevation of Privilege | Gaining unauthorized permissions | Least privilege IAM roles, RBAC with Spring Security, scoped JWT | Broken Access Control |
| Data Store | Model | Best For in Fintech | Java Integration |
|---|---|---|---|
| PostgreSQL | Relational + ACID | Accounts, transactions, ledger entries — anything requiring ACID guarantees and complex queries. The default choice. | Spring Data JPA, JDBC, jOOQ |
| Redis | Key-Value / Cache | Session tokens, rate limiting counters, idempotency key cache, frequently-read balances. Sub-millisecond reads. | Spring Data Redis, Lettuce, Redisson |
| Kafka | Log / Event Stream | Event streaming, audit log, CDC, async communication between services. Not a database — retention is finite. | Spring Kafka, Kafka Streams, Schema Registry |
| Elasticsearch | Document / Search | Transaction search, compliance reporting, log aggregation. NOT for primary data — use as a derived read model fed by CDC. | Spring Data Elasticsearch, High-Level REST Client |
| DynamoDB / Cassandra | Wide-Column / NoSQL | High-write-throughput time-series: fraud signals, click events, pricing feeds. When you know the access patterns ahead of time. | AWS SDK, Spring Data Cassandra |
| S3 / GCS | Object Storage | Audit log archives, compliance exports, ML training data, batch reports. Not queryable (use Athena/BigQuery on top). | AWS SDK v2, Spring Cloud AWS |
// Every financial movement = two ledger entries that sum to zero record LedgerEntry( UUID id, UUID transactionId, AccountId accountId, Money amount, // positive = debit, negative = credit EntryType type, // DEBIT or CREDIT Instant postedAt ) {} // Transfer $100 from Customer A to Customer B: var entries = List.of( new LedgerEntry(uuid(), txId, customerA, Money.of(100), DEBIT, now()), new LedgerEntry(uuid(), txId, customerB, Money.of(100), CREDIT, now()) ); // INVARIANT: sum of all entries for a transaction = 0 // DEBIT 100 + CREDIT -100 = 0 ✓ assert entries.stream() .map(e -> e.type() == DEBIT ? e.amount() : e.amount().negate()) .reduce(Money.ZERO, Money::add) .equals(Money.ZERO); // Must always pass
| Standard/Regulation | Applies To | Key Requirement for Developers |
|---|---|---|
| PCI-DSS v4 | Any system touching payment card data | Tokenize PANs, encrypt CHD at rest/transit, vulnerability scanning, penetration testing, strict access control, detailed audit logs |
| GDPR / CCPA | Any system with EU/CA personal data | Right to erasure (pseudonymize don't delete for audit), data minimization, purpose limitation, breach notification <72hrs, data subject request APIs |
| SOC 2 Type II | SaaS fintech platforms (trust) | Evidenced security controls: access reviews, change management logs, incident response records, availability monitoring — your observability IS the evidence |
| AML / BSA | Banks, money transmitters | Transaction monitoring, suspicious activity reports (SARs), customer due diligence data retention, OFAC sanctions screening API |
| FFIEC | US banks and fintech partners | IT risk management, authentication standards, vendor risk management. Affects architecture decisions — multi-cloud, disaster recovery, BCP. |
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory (RAM) reference | 100 ns |
| SSD read (NVMe) | 100 µs |
| HDD seek | 10 ms |
| Same datacenter round trip | 0.5 ms |
| US cross-country round trip | 40 ms |
| US to Europe round trip | 150 ms |
| Postgres query (indexed) | 1–5 ms |
| Redis GET | 0.1–0.5 ms |
| LLM API call (Claude Sonnet) | 500ms–3s |
| Scale Number | Value |
|---|---|
| Requests/sec, single server | 1,000–10,000 RPS |
| Postgres transactions/sec | 1,000–10,000 TPS |
| Kafka throughput/partition | 100 MB/s |
| S3 PUT throughput | 3,500 req/s per prefix |
| Redis ops/sec | 100,000+ |
| Twitter: peak tweets/sec | ~150,000 |
| Visa: transactions/sec | ~24,000 |
| Bytes per day: 1B users, 10 actions | ~1 TB/day |
| Encoding 1 minute video (720p) | ~60 MB |
| Characters per token (LLM) | ~4 chars |
| Step | Duration | What to Do | Key Questions |
|---|---|---|---|
| 1. Clarify Requirements | 5 min | Ask before designing. Scope aggressively. Don't assume. | Read vs write heavy? DAU? Latency SLA? Consistency required? Global or single region? |
| 2. Capacity Estimation | 3 min | Back-of-envelope: storage, bandwidth, QPS. Show your math. | 100M users × 10 payments/month = 1B payments/month = ~400 TPS avg, ~4000 TPS peak |
| 3. High-Level Design | 10 min | Draw the boxes: API gateway, services, databases, caches, queues. | Show data flow end-to-end. Name the components. Don't skip the client. |
| 4. Deep Dive | 15 min | Pick the hardest parts (consistency, scale, fault tolerance). Show tradeoffs. | How do you handle duplicate payments? What happens when DB goes down? How do you scale reads? |
| 5. Identify Bottlenecks | 5 min | Proactively surface weak points. Show you know the limits of your design. | Single DB is a bottleneck → read replicas → sharding → when each applies |
| When You Hear... | First Question to Ask | The Trap to Avoid |
|---|---|---|
| "We need microservices" | What coordination problem are you solving? What's the team ownership model? | Distributed monolith — services that need to deploy together and share a DB |
| "We need event sourcing" | Do you genuinely need audit history, time travel, or multiple projections? Or do you want an audit log? | Event sourcing for simple CRUD — the complexity cost is enormous |
| "We need to use AI for this" | What's the deterministic solution? Is it inadequate? Why? | Using LLMs where regex, a lookup table, or a trained classifier is 10x better |
| "We should cache this" | What's the invalidation strategy? What's the consistency SLA? | Stale cache in fintech = customer sees wrong balance = support tickets = regulatory risk |
| "This needs to be async" | What's the consumer? What's the failure model? Who owns the retry? | Fire-and-forget messaging where the consumer can silently fail |
| "Let's rewrite this legacy service" | What specific problem does a rewrite solve that incremental improvement doesn't? | Second-system syndrome — the rewrite accumulates all the complexity the original had, plus new ones |
| "The AI can handle that" | What are the failure modes? What's the fallback? How do we eval quality? | Deploying LLM-driven logic without evaluation framework = non-deterministic production bug |