Mortgage Processing Platform Transformation

The Challenge

City National Bank faced significant challenges with their mortgage processing system:

Long approval times: 45-day average from application to approval
Sequential bottlenecks: Each step waited for the previous to complete
Manual interventions: Frequent human touchpoints slowed the process
Monolithic architecture: Difficult to scale and maintain
Limited visibility: Hard to identify where delays occurred

The bank needed a modern solution that could handle increasing volume while maintaining the highest standards of accuracy and regulatory compliance.

The Solution

I led the transformation of the mortgage processing platform, adopting a microservices architecture that fundamentally changed how applications flowed through the system.

System Architecture

The platform consists of specialized microservices orchestrated through AWS SQS/SNS for event-driven, parallel processing:

mortgage-services/
├── application-service    # Main orchestration & API Gateway
├── credit-service        # Credit bureau integration
├── validation-service    # Document validation
├── employment-service    # Employment verification
├── document-service      # Document processing & generation
└── notification-service  # Real-time updates

💡 Click on any service above to see detailed metrics, technology stack, and performance characteristics!

Event-Driven Processing Flow

The system uses an event-driven architecture to process applications asynchronously. When a client submits an application, it immediately returns an Application ID while verification tasks run in parallel:

AWS Infrastructure

Application State Machine

Applications flow through a sophisticated state machine with parallel processing and decision logic:

Key Technical Improvements

1. Document Processing Service

Built a robust document validation service that processes multiple document types in parallel:

// document-service/src/index.js
const { S3, SQS, SNS } = require("aws-sdk");
const { v4: uuid } = require("uuid");
const PDFParser = require("pdf-parse");

class DocumentService {
  constructor() {
    this.s3 = new S3();
    this.sqs = new SQS();
    this.sns = new SNS();
  }

  async start() {
    while (true) {
      try {
        const messages = await this.sqs
          .receiveMessage({
            QueueUrl: process.env.DOCUMENT_QUEUE_URL,
            MaxNumberOfMessages: 5,
            WaitTimeSeconds: 20,
          })
          .promise();

        if (messages.Messages) {
          await Promise.all(
            messages.Messages.map((msg) => this.processMessage(msg))
          );
        }
      } catch (error) {
        console.error("Error processing documents:", error);
      }
    }
  }

  async processMessage(message) {
    try {
      const { applicationId, documents } = JSON.parse(message.Body);
      const results = await Promise.all(
        documents.map((doc) => this.processDocument(applicationId, doc))
      );

      // Aggregate results
      const validationResult = {
        applicationId,
        status: results.every((r) => r.valid) ? "APPROVED" : "REJECTED",
        details: results,
      };

      // Store validation results
      await this.s3
        .putObject({
          Bucket: process.env.RESULTS_BUCKET,
          Key: `validations/${applicationId}.json`,
          Body: JSON.stringify(validationResult),
        })
        .promise();

      // Publish results to SNS
      await this.sns
        .publish({
          TopicArn: process.env.DOCUMENT_RESULTS_TOPIC,
          Message: JSON.stringify({
            type: "DOCUMENT_VALIDATION_COMPLETED",
            applicationId,
            result: validationResult,
          }),
        })
        .promise();

      // Delete processed message
      await this.sqs
        .deleteMessage({
          QueueUrl: process.env.DOCUMENT_QUEUE_URL,
          ReceiptHandle: message.ReceiptHandle,
        })
        .promise();
    } catch (error) {
      console.error("Error processing document message:", error);
    }
  }

  async processDocument(applicationId, document) {
    const documentId = uuid();

    // Download document from S3
    const s3Object = await this.s3
      .getObject({
        Bucket: process.env.DOCUMENTS_BUCKET,
        Key: document.key,
      })
      .promise();

    // Parse PDF content
    const pdfData = await PDFParser(s3Object.Body);

    // Validate document based on type
    const validationResult = await this.validateDocument(
      document.type,
      pdfData
    );

    // Store processed document
    await this.s3
      .putObject({
        Bucket: process.env.PROCESSED_BUCKET,
        Key: `${applicationId}/${documentId}.json`,
        Body: JSON.stringify({
          documentId,
          originalKey: document.key,
          type: document.type,
          validation: validationResult,
          processedAt: new Date().toISOString(),
        }),
      })
      .promise();

    return {
      documentId,
      type: document.type,
      valid: validationResult.valid,
      errors: validationResult.errors,
    };
  }

  async validateDocument(type, pdfData) {
    const validators = {
      W2: this.validateW2.bind(this),
      PAYSTUB: this.validatePaystub.bind(this),
      BANK_STATEMENT: this.validateBankStatement.bind(this),
      TAX_RETURN: this.validateTaxReturn.bind(this),
    };

    const validator = validators[type];
    if (!validator) {
      return { valid: false, errors: ["Unknown document type"] };
    }

    return await validator(pdfData);
  }
}

2. Employment Verification with Retry Logic

Integrated with multiple employment verification providers (Workday, TheWorkNumber, Equifax) with sophisticated retry and timeout handling:

class EmploymentVerificationService {
  async verifyEmployment(application) {
    const providers = ["workday", "theworknumber", "equifax"];
    let attempts = 0;
    const maxRetries = 3;

    while (attempts < maxRetries) {
      try {
        // Try primary provider first
        const result = await this.callProviderWithTimeout(
          providers[attempts],
          application,
          45000 // 45 second timeout
        );

        // Update state in DynamoDB
        await this.updateVerificationState(application.id, "COMPLETED", result);

        return result;
      } catch (error) {
        attempts++;

        if (attempts >= maxRetries) {
          // Final failure, mark for manual review
          await this.updateVerificationState(
            application.id,
            "MANUAL_REVIEW_REQUIRED",
            { error: error.message }
          );
          throw error;
        }

        // Exponential backoff
        await this.delay(Math.pow(2, attempts) * 1000);
      }
    }
  }

  async callProviderWithTimeout(provider, application, timeout) {
    return Promise.race([
      this.providers[provider].verify(application),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error("Timeout")), timeout)
      ),
    ]);
  }
}

3. API Response Time Optimization

Implemented tiered timeout strategy based on verification complexity:

Fast Path (2-5s): Cached results and pre-validated data
Normal Path (10-15s): Standard API integrations
Slow Path (30-45s): Complex verifications requiring multiple sources

Timeout thresholds:

Primary: 45 seconds
Retry: 60 seconds
Maximum: 2 minutes (then escalate to manual review)

4. Redis Caching Layer

Reduced API response times by 50%:

// High-traffic data caching
async function getCreditScore(ssn) {
  const cacheKey = `credit:${ssn}`;
  const cached = await redis.get(cacheKey);

  if (cached) {
    return JSON.parse(cached);
  }

  const score = await creditBureau.fetch(ssn);
  await redis.setex(cacheKey, 3600, JSON.stringify(score)); // 1 hour TTL

  return score;
}

5. Jenkins CI/CD Pipeline

Automated testing and deployment:

// Simplified Jenkins pipeline
pipeline {
    stages {
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps {
                        sh 'npm run test:unit'
                    }
                }
                stage('Integration Tests') {
                    steps {
                        sh 'npm run test:integration'
                    }
                }
            }
        }
        stage('Deploy') {
            steps {
                sh 'kubectl apply -f k8s/'
                sh 'kubectl rollout status deployment/mortgage-api'
            }
        }
    }
}

6. AWS Cloud Infrastructure

Comprehensive AWS deployment with containerized microservices:

ECS (Elastic Container Service): Container orchestration for all microservices
AWS SQS: Message queues for asynchronous processing and service decoupling
AWS SNS: Pub/sub notifications for real-time status updates
API Gateway: Secure API endpoints with throttling and monitoring
DynamoDB: State management and application tracking
S3: Document storage with encryption at rest and in transit
Redis (ElastiCache): Distributed caching for sub-second response times
CloudWatch: Comprehensive monitoring, logging, and alerting
Auto Scaling: Automatic capacity management based on queue depth and CPU utilization

Critical System Features

Reliability & Resilience

Non-blocking operations: All external calls use async patterns
Circuit breakers: Automatic fallback when APIs exceed error thresholds
Retry logic: Exponential backoff with max 3 attempts
Graceful degradation: System continues processing even if individual services fail
Health checks: Automated monitoring ensures only healthy instances receive traffic

Compliance & Auditability

State tracking: Every application state change logged in DynamoDB
Audit trails: Complete history of all verification attempts and results
Encryption: Data encrypted at rest (S3, DynamoDB) and in transit (TLS 1.3)
Access controls: Role-based permissions with AWS IAM
Regulatory compliance: SOC 2, PCI DSS, and GLBA standards met

Performance & Scalability

Parallel processing: Credit, employment, and document checks run simultaneously
Worker threads: CPU-intensive tasks offloaded to dedicated threads
Auto-scaling: Services scale based on queue depth (target: 100 messages per instance)
Caching strategy: Frequently accessed data cached with smart TTL policies
Connection pooling: Reusable connections to databases and external APIs

Observability

Distributed tracing: Track requests across all microservices
Centralized logging: ELK stack aggregates logs from all services
Real-time metrics: Dashboard showing throughput, latency, error rates
Custom alerts: CloudWatch alarms for anomalies and SLA breaches
Application insights: Detailed analytics on approval times and bottlenecks

Results & Impact

⚡

9 days↓ 80%

Approval Time

Reduced from 45 days through parallel processing

📈

+20%↑ Growth

Application Throughput

Increased capacity to handle more applications

🚀

50% faster↑ Speed

API Response Time

Redis caching and optimized queries

⚙️

40% less↓ Faster

Deployment Time

Automated CI/CD pipeline with Jenkins

✅

99.9%✓ Stable

System Uptime

High availability with AWS multi-AZ

🤖

95%✓ Automated

Automation

Manual validation steps eliminated

Approval Processing Time

Before

45 days

After

9 days

80% faster - saving customers 36 days

API Response Times

Before

2-5 seconds

After

0.5-2 seconds

50% improvement with Redis caching

Deployment Frequency

Before

Once per month

After

Multiple per day

40% faster with automated CI/CD

Business Impact

Improved customer satisfaction with faster turnaround
Reduced operational costs through automation
Increased competitive advantage in the market
Excellence in Tech Innovation Award 2023 recognition

Implementation Journey

Phase 1: Discovery2 months

Architecture Design & Planning

Analyzed existing monolith, identified bottlenecks, designed microservices architecture, and created migration roadmap.

✓

Documented current system architecture and pain points

✓

Designed event-driven microservices architecture

✓

Created detailed migration plan with risk assessment

✓

Gained stakeholder and leadership buy-in

Phase 2: Foundation3 months

Infrastructure & Core Services

Set up AWS infrastructure, implemented CI/CD pipeline, and built core application service with message queuing.

✓

Deployed ECS cluster with auto-scaling

✓

Implemented SQS/SNS messaging infrastructure

✓

Built Jenkins CI/CD pipeline with automated testing

✓

Created Application Service as orchestrator

Phase 3: Verification Services4 months

Credit, Employment & Document Services

Developed and deployed specialized verification microservices with retry logic, caching, and error handling.

✓

Integrated with 3 credit bureaus (Experian, Equifax, TransUnion)

✓

Connected to employment verification providers

✓

Built document processing with PDF parsing and validation

✓

Implemented Redis caching for 50% performance gain

Phase 4: Migration2 months

Gradual Traffic Migration

Incrementally shifted production traffic from monolith to microservices using feature flags and canary deployments.

✓

Started with 5% traffic canary deployment

✓

Gradually increased to 50%, then 100%

✓

Zero critical incidents during migration

✓

Completed full cutover with rollback capability

Phase 5: Optimization2 months

Performance Tuning & Monitoring

Fine-tuned system performance, implemented comprehensive monitoring, and optimized based on production metrics.

✓

Deployed ELK stack for centralized logging

✓

Implemented distributed tracing with X-Ray

✓

Optimized database queries and caching strategies

✓

Achieved 99.9% uptime SLA

Technical Leadership

As Lead Software Development Engineer, I:

Led a team of 6 engineers through the modernization effort
Collaborated with compliance to ensure regulatory requirements
Presented to senior leadership on technical strategy and progress
Mentored junior developers on microservices best practices
Established coding standards and review processes

Architecture Decisions

Why Microservices?

Chose microservices over a refactored monolith because:

Independent deployments: Update credit check without touching other services
Technology flexibility: Use Python for ML, Node.js for APIs
Fault isolation: Failure in one service doesn't bring down entire system
Team autonomy: Smaller teams can own and iterate on services

Why AWS?

Selected AWS for cloud infrastructure because:

Mature ecosystem: Wide range of managed services
Bank requirements: Strong compliance certifications (SOC 2, PCI DSS)
Cost optimization: Reserved instances and auto-scaling
Existing expertise: Team familiarity with AWS services

Why Jenkins?

Chose Jenkins for CI/CD despite newer options because:

Bank's existing infrastructure: Already used organization-wide
Plugin ecosystem: Extensions for every tool we needed
Pipeline as code: Version control for deployment logic
Easy integration: Connected to existing systems seamlessly

Lessons Learned

What Worked Well

Incremental migration: Moved one service at a time, reducing risk
Comprehensive testing: Automated tests caught issues before production
Monitoring first: Set up observability before migrating critical services
Team training: Invested in upskilling team on new architecture
Non-blocking operations: Async processing eliminated bottlenecks
Circuit breakers: Protected system from cascading failures with timeout management

Challenges Overcome

Data consistency: Implemented saga pattern for distributed transactions
Service discovery: Used Kubernetes service mesh for reliable communication
Debugging complexity: Built centralized logging with ELK stack
Cultural change: Helped team transition from monolith mindset
API timeout management: Implemented tiered timeout strategy with exponential backoff
Error recovery: Built comprehensive retry logic with state tracking for audit compliance

Best Practices Implemented

Error Handling & Recovery

try {
  // Primary verification attempt
  const result = await verificationService.verify(application);

  // Update state in DynamoDB
  await stateManager.update(application.id, "COMPLETED", result);

  // Send notification via SNS
  await notificationService.notify(application.id, result);
} catch (error) {
  // Log error with full context
  logger.error("Verification failed", {
    applicationId: application.id,
    error: error.message,
  });

  // Implement retry with exponential backoff
  if (attempts < maxRetries) {
    await delay(Math.pow(2, attempts) * 1000);
    return retry();
  }

  // Escalate to manual review
  await stateManager.update(application.id, "MANUAL_REVIEW", error);
  await alertService.escalate(application.id);
}

Monitoring & Alerts

Track API response times across all verification providers
Monitor timeout rates and adjust thresholds dynamically
Watch retry attempt patterns to identify degraded services
Log all verification states for compliance and audit trails
Alert on-call engineers for failures requiring immediate attention

Testing Strategy

Unit tests: Each service has 90%+ code coverage
Integration tests: End-to-end verification flows
Load testing: Simulated 3x peak volume to validate auto-scaling
Failure scenario testing: Tested timeout handling, API failures, network issues
Recovery testing: Validated system recovery from partial failures

What I'd Do Next

Looking forward, I would enhance the platform with:

Real-time status tracking so customers can see exactly where their application is in the pipeline
ML-based fraud detection integrated into the approval workflow to catch suspicious patterns early
Analytics dashboard to identify bottlenecks and optimization opportunities through data visualization

This project earned me the Excellence in Tech Innovation Award 2023 and demonstrates my ability to lead large-scale architectural transformations that deliver measurable business value while maintaining the highest standards of reliability and compliance.

Mortgage Processing Platform Transformation

Tech Stack

The Challenge

The Solution

System Architecture

Application Service

Credit Service

Employment Service

Document Service

Validation Service

Notification Service

Event-Driven Processing Flow

AWS Infrastructure

Application State Machine

Key Technical Improvements

Critical System Features

Results & Impact

Approval Time

Application Throughput

API Response Time

Deployment Time

System Uptime

Automation

Approval Processing Time

API Response Times

Deployment Frequency

Business Impact

Implementation Journey

Architecture Design & Planning

Infrastructure & Core Services

Credit, Employment & Document Services

Gradual Traffic Migration

Performance Tuning & Monitoring

Technical Leadership

Architecture Decisions

Why Microservices?

Why AWS?

Why Jenkins?

Lessons Learned

What Worked Well

Challenges Overcome

Best Practices Implemented

What I'd Do Next

What I'd Do Next