Systems

Mortgage Processing Platform Transformation

Role: Lead Software Development Engineer, Technical Lead
Timeline: Jul 2021 - Oct 2024

Excellence in Tech Innovation Award winner. Transformed mortgage approval process from 45 days to 9 days through microservices architecture, increasing throughput by 20% and cutting deployment times by 40%.

Mortgage Processing Platform Transformation
↓
45 days → 9 days approval time
↑
20% throughput increase
šŸ†
Excellence in Tech Innovation Award 2023

Tech Stack

MicroservicesNode.jsPythonJenkinsRedisAWSDockerKubernetes

The Challenge

City National Bank faced significant challenges with their mortgage processing system:

  • Long approval times: 45-day average from application to approval
  • Sequential bottlenecks: Each step waited for the previous to complete
  • Manual interventions: Frequent human touchpoints slowed the process
  • Monolithic architecture: Difficult to scale and maintain
  • Limited visibility: Hard to identify where delays occurred

The bank needed a modern solution that could handle increasing volume while maintaining the highest standards of accuracy and regulatory compliance.

The Solution

I led the transformation of the mortgage processing platform, adopting a microservices architecture that fundamentally changed how applications flowed through the system.

System Architecture

The platform consists of specialized microservices orchestrated through AWS SQS/SNS for event-driven, parallel processing:

mortgage-services/
ā”œā”€ā”€ application-service    # Main orchestration & API Gateway
ā”œā”€ā”€ credit-service        # Credit bureau integration
ā”œā”€ā”€ validation-service    # Document validation
ā”œā”€ā”€ employment-service    # Employment verification
ā”œā”€ā”€ document-service      # Document processing & generation
└── notification-service  # Real-time updates

šŸ’” Click on any service above to see detailed metrics, technology stack, and performance characteristics!

Event-Driven Processing Flow

The system uses an event-driven architecture to process applications asynchronously. When a client submits an application, it immediately returns an Application ID while verification tasks run in parallel:

AWS Infrastructure

Application State Machine

Applications flow through a sophisticated state machine with parallel processing and decision logic:

Key Technical Improvements

1. Document Processing Service

Built a robust document validation service that processes multiple document types in parallel:

// document-service/src/index.js
const { S3, SQS, SNS } = require("aws-sdk");
const { v4: uuid } = require("uuid");
const PDFParser = require("pdf-parse");

class DocumentService {
  constructor() {
    this.s3 = new S3();
    this.sqs = new SQS();
    this.sns = new SNS();
  }

  async start() {
    while (true) {
      try {
        const messages = await this.sqs
          .receiveMessage({
            QueueUrl: process.env.DOCUMENT_QUEUE_URL,
            MaxNumberOfMessages: 5,
            WaitTimeSeconds: 20,
          })
          .promise();

        if (messages.Messages) {
          await Promise.all(
            messages.Messages.map((msg) => this.processMessage(msg))
          );
        }
      } catch (error) {
        console.error("Error processing documents:", error);
      }
    }
  }

  async processMessage(message) {
    try {
      const { applicationId, documents } = JSON.parse(message.Body);
      const results = await Promise.all(
        documents.map((doc) => this.processDocument(applicationId, doc))
      );

      // Aggregate results
      const validationResult = {
        applicationId,
        status: results.every((r) => r.valid) ? "APPROVED" : "REJECTED",
        details: results,
      };

      // Store validation results
      await this.s3
        .putObject({
          Bucket: process.env.RESULTS_BUCKET,
          Key: `validations/${applicationId}.json`,
          Body: JSON.stringify(validationResult),
        })
        .promise();

      // Publish results to SNS
      await this.sns
        .publish({
          TopicArn: process.env.DOCUMENT_RESULTS_TOPIC,
          Message: JSON.stringify({
            type: "DOCUMENT_VALIDATION_COMPLETED",
            applicationId,
            result: validationResult,
          }),
        })
        .promise();

      // Delete processed message
      await this.sqs
        .deleteMessage({
          QueueUrl: process.env.DOCUMENT_QUEUE_URL,
          ReceiptHandle: message.ReceiptHandle,
        })
        .promise();
    } catch (error) {
      console.error("Error processing document message:", error);
    }
  }

  async processDocument(applicationId, document) {
    const documentId = uuid();

    // Download document from S3
    const s3Object = await this.s3
      .getObject({
        Bucket: process.env.DOCUMENTS_BUCKET,
        Key: document.key,
      })
      .promise();

    // Parse PDF content
    const pdfData = await PDFParser(s3Object.Body);

    // Validate document based on type
    const validationResult = await this.validateDocument(
      document.type,
      pdfData
    );

    // Store processed document
    await this.s3
      .putObject({
        Bucket: process.env.PROCESSED_BUCKET,
        Key: `${applicationId}/${documentId}.json`,
        Body: JSON.stringify({
          documentId,
          originalKey: document.key,
          type: document.type,
          validation: validationResult,
          processedAt: new Date().toISOString(),
        }),
      })
      .promise();

    return {
      documentId,
      type: document.type,
      valid: validationResult.valid,
      errors: validationResult.errors,
    };
  }

  async validateDocument(type, pdfData) {
    const validators = {
      W2: this.validateW2.bind(this),
      PAYSTUB: this.validatePaystub.bind(this),
      BANK_STATEMENT: this.validateBankStatement.bind(this),
      TAX_RETURN: this.validateTaxReturn.bind(this),
    };

    const validator = validators[type];
    if (!validator) {
      return { valid: false, errors: ["Unknown document type"] };
    }

    return await validator(pdfData);
  }
}

2. Employment Verification with Retry Logic

Integrated with multiple employment verification providers (Workday, TheWorkNumber, Equifax) with sophisticated retry and timeout handling:

class EmploymentVerificationService {
  async verifyEmployment(application) {
    const providers = ["workday", "theworknumber", "equifax"];
    let attempts = 0;
    const maxRetries = 3;

    while (attempts < maxRetries) {
      try {
        // Try primary provider first
        const result = await this.callProviderWithTimeout(
          providers[attempts],
          application,
          45000 // 45 second timeout
        );

        // Update state in DynamoDB
        await this.updateVerificationState(application.id, "COMPLETED", result);

        return result;
      } catch (error) {
        attempts++;

        if (attempts >= maxRetries) {
          // Final failure, mark for manual review
          await this.updateVerificationState(
            application.id,
            "MANUAL_REVIEW_REQUIRED",
            { error: error.message }
          );
          throw error;
        }

        // Exponential backoff
        await this.delay(Math.pow(2, attempts) * 1000);
      }
    }
  }

  async callProviderWithTimeout(provider, application, timeout) {
    return Promise.race([
      this.providers[provider].verify(application),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error("Timeout")), timeout)
      ),
    ]);
  }
}

3. API Response Time Optimization

Implemented tiered timeout strategy based on verification complexity:

  • Fast Path (2-5s): Cached results and pre-validated data
  • Normal Path (10-15s): Standard API integrations
  • Slow Path (30-45s): Complex verifications requiring multiple sources

Timeout thresholds:

  • Primary: 45 seconds
  • Retry: 60 seconds
  • Maximum: 2 minutes (then escalate to manual review)

4. Redis Caching Layer

Reduced API response times by 50%:

// High-traffic data caching
async function getCreditScore(ssn) {
  const cacheKey = `credit:${ssn}`;
  const cached = await redis.get(cacheKey);

  if (cached) {
    return JSON.parse(cached);
  }

  const score = await creditBureau.fetch(ssn);
  await redis.setex(cacheKey, 3600, JSON.stringify(score)); // 1 hour TTL

  return score;
}

5. Jenkins CI/CD Pipeline

Automated testing and deployment:

// Simplified Jenkins pipeline
pipeline {
    stages {
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps {
                        sh 'npm run test:unit'
                    }
                }
                stage('Integration Tests') {
                    steps {
                        sh 'npm run test:integration'
                    }
                }
            }
        }
        stage('Deploy') {
            steps {
                sh 'kubectl apply -f k8s/'
                sh 'kubectl rollout status deployment/mortgage-api'
            }
        }
    }
}

6. AWS Cloud Infrastructure

Comprehensive AWS deployment with containerized microservices:

  • ECS (Elastic Container Service): Container orchestration for all microservices
  • AWS SQS: Message queues for asynchronous processing and service decoupling
  • AWS SNS: Pub/sub notifications for real-time status updates
  • API Gateway: Secure API endpoints with throttling and monitoring
  • DynamoDB: State management and application tracking
  • S3: Document storage with encryption at rest and in transit
  • Redis (ElastiCache): Distributed caching for sub-second response times
  • CloudWatch: Comprehensive monitoring, logging, and alerting
  • Auto Scaling: Automatic capacity management based on queue depth and CPU utilization

Critical System Features

Reliability & Resilience

  • Non-blocking operations: All external calls use async patterns
  • Circuit breakers: Automatic fallback when APIs exceed error thresholds
  • Retry logic: Exponential backoff with max 3 attempts
  • Graceful degradation: System continues processing even if individual services fail
  • Health checks: Automated monitoring ensures only healthy instances receive traffic

Compliance & Auditability

  • State tracking: Every application state change logged in DynamoDB
  • Audit trails: Complete history of all verification attempts and results
  • Encryption: Data encrypted at rest (S3, DynamoDB) and in transit (TLS 1.3)
  • Access controls: Role-based permissions with AWS IAM
  • Regulatory compliance: SOC 2, PCI DSS, and GLBA standards met

Performance & Scalability

  • Parallel processing: Credit, employment, and document checks run simultaneously
  • Worker threads: CPU-intensive tasks offloaded to dedicated threads
  • Auto-scaling: Services scale based on queue depth (target: 100 messages per instance)
  • Caching strategy: Frequently accessed data cached with smart TTL policies
  • Connection pooling: Reusable connections to databases and external APIs

Observability

  • Distributed tracing: Track requests across all microservices
  • Centralized logging: ELK stack aggregates logs from all services
  • Real-time metrics: Dashboard showing throughput, latency, error rates
  • Custom alerts: CloudWatch alarms for anomalies and SLA breaches
  • Application insights: Detailed analytics on approval times and bottlenecks

Results & Impact

⚔
9 days↓ 80%

Approval Time

Reduced from 45 days through parallel processing

šŸ“ˆ
+20%↑ Growth

Application Throughput

Increased capacity to handle more applications

šŸš€
50% faster↑ Speed

API Response Time

Redis caching and optimized queries

āš™ļø
40% less↓ Faster

Deployment Time

Automated CI/CD pipeline with Jenkins

āœ…
99.9%āœ“ Stable

System Uptime

High availability with AWS multi-AZ

šŸ¤–
95%āœ“ Automated

Automation

Manual validation steps eliminated

Approval Processing Time

Before
45 days
After
9 days
80% faster - saving customers 36 days

API Response Times

Before
2-5 seconds
After
0.5-2 seconds
50% improvement with Redis caching

Deployment Frequency

Before
Once per month
After
Multiple per day
40% faster with automated CI/CD

Business Impact

  • Improved customer satisfaction with faster turnaround
  • Reduced operational costs through automation
  • Increased competitive advantage in the market
  • Excellence in Tech Innovation Award 2023 recognition

Implementation Journey

1
Phase 1: Discovery2 months

Architecture Design & Planning

Analyzed existing monolith, identified bottlenecks, designed microservices architecture, and created migration roadmap.

āœ“
Documented current system architecture and pain points
āœ“
Designed event-driven microservices architecture
āœ“
Created detailed migration plan with risk assessment
āœ“
Gained stakeholder and leadership buy-in
2
Phase 2: Foundation3 months

Infrastructure & Core Services

Set up AWS infrastructure, implemented CI/CD pipeline, and built core application service with message queuing.

āœ“
Deployed ECS cluster with auto-scaling
āœ“
Implemented SQS/SNS messaging infrastructure
āœ“
Built Jenkins CI/CD pipeline with automated testing
āœ“
Created Application Service as orchestrator
3
Phase 3: Verification Services4 months

Credit, Employment & Document Services

Developed and deployed specialized verification microservices with retry logic, caching, and error handling.

āœ“
Integrated with 3 credit bureaus (Experian, Equifax, TransUnion)
āœ“
Connected to employment verification providers
āœ“
Built document processing with PDF parsing and validation
āœ“
Implemented Redis caching for 50% performance gain
4
Phase 4: Migration2 months

Gradual Traffic Migration

Incrementally shifted production traffic from monolith to microservices using feature flags and canary deployments.

āœ“
Started with 5% traffic canary deployment
āœ“
Gradually increased to 50%, then 100%
āœ“
Zero critical incidents during migration
āœ“
Completed full cutover with rollback capability
5
Phase 5: Optimization2 months

Performance Tuning & Monitoring

Fine-tuned system performance, implemented comprehensive monitoring, and optimized based on production metrics.

āœ“
Deployed ELK stack for centralized logging
āœ“
Implemented distributed tracing with X-Ray
āœ“
Optimized database queries and caching strategies
āœ“
Achieved 99.9% uptime SLA

Technical Leadership

As Lead Software Development Engineer, I:

  • Led a team of 6 engineers through the modernization effort
  • Collaborated with compliance to ensure regulatory requirements
  • Presented to senior leadership on technical strategy and progress
  • Mentored junior developers on microservices best practices
  • Established coding standards and review processes

Architecture Decisions

Why Microservices?

Chose microservices over a refactored monolith because:

  1. Independent deployments: Update credit check without touching other services
  2. Technology flexibility: Use Python for ML, Node.js for APIs
  3. Fault isolation: Failure in one service doesn't bring down entire system
  4. Team autonomy: Smaller teams can own and iterate on services

Why AWS?

Selected AWS for cloud infrastructure because:

  1. Mature ecosystem: Wide range of managed services
  2. Bank requirements: Strong compliance certifications (SOC 2, PCI DSS)
  3. Cost optimization: Reserved instances and auto-scaling
  4. Existing expertise: Team familiarity with AWS services

Why Jenkins?

Chose Jenkins for CI/CD despite newer options because:

  1. Bank's existing infrastructure: Already used organization-wide
  2. Plugin ecosystem: Extensions for every tool we needed
  3. Pipeline as code: Version control for deployment logic
  4. Easy integration: Connected to existing systems seamlessly

Lessons Learned

What Worked Well

  • Incremental migration: Moved one service at a time, reducing risk
  • Comprehensive testing: Automated tests caught issues before production
  • Monitoring first: Set up observability before migrating critical services
  • Team training: Invested in upskilling team on new architecture
  • Non-blocking operations: Async processing eliminated bottlenecks
  • Circuit breakers: Protected system from cascading failures with timeout management

Challenges Overcome

  • Data consistency: Implemented saga pattern for distributed transactions
  • Service discovery: Used Kubernetes service mesh for reliable communication
  • Debugging complexity: Built centralized logging with ELK stack
  • Cultural change: Helped team transition from monolith mindset
  • API timeout management: Implemented tiered timeout strategy with exponential backoff
  • Error recovery: Built comprehensive retry logic with state tracking for audit compliance

Best Practices Implemented

Error Handling & Recovery

try {
  // Primary verification attempt
  const result = await verificationService.verify(application);

  // Update state in DynamoDB
  await stateManager.update(application.id, "COMPLETED", result);

  // Send notification via SNS
  await notificationService.notify(application.id, result);
} catch (error) {
  // Log error with full context
  logger.error("Verification failed", {
    applicationId: application.id,
    error: error.message,
  });

  // Implement retry with exponential backoff
  if (attempts < maxRetries) {
    await delay(Math.pow(2, attempts) * 1000);
    return retry();
  }

  // Escalate to manual review
  await stateManager.update(application.id, "MANUAL_REVIEW", error);
  await alertService.escalate(application.id);
}

Monitoring & Alerts

  • Track API response times across all verification providers
  • Monitor timeout rates and adjust thresholds dynamically
  • Watch retry attempt patterns to identify degraded services
  • Log all verification states for compliance and audit trails
  • Alert on-call engineers for failures requiring immediate attention

Testing Strategy

  • Unit tests: Each service has 90%+ code coverage
  • Integration tests: End-to-end verification flows
  • Load testing: Simulated 3x peak volume to validate auto-scaling
  • Failure scenario testing: Tested timeout handling, API failures, network issues
  • Recovery testing: Validated system recovery from partial failures

What I'd Do Next

Looking forward, I would enhance the platform with:

  1. Real-time status tracking so customers can see exactly where their application is in the pipeline
  2. ML-based fraud detection integrated into the approval workflow to catch suspicious patterns early
  3. Analytics dashboard to identify bottlenecks and optimization opportunities through data visualization

This project earned me the Excellence in Tech Innovation Award 2023 and demonstrates my ability to lead large-scale architectural transformations that deliver measurable business value while maintaining the highest standards of reliability and compliance.

What I'd Do Next

  • →Implement real-time application status tracking for customers
  • →Add ML-based fraud detection to approval pipeline
  • →Build analytics dashboard for identifying process optimization opportunities