# @hsuite/health - Comprehensive System Health Monitoring

> 🏥 **Advanced health monitoring and system diagnostics library for NestJS applications with DAG network monitoring**

Enterprise-grade health monitoring solution providing real-time system resource tracking, service health checks, performance metrics collection, and specialized DAG network monitoring with event-driven updates and comprehensive diagnostics.

***

## 📚 Table of Contents

* [✨ Quick Start](#-quick-start)
* [🏗️ Architecture](#️-architecture)
* [🔧 API Reference](#-api-reference)
* [📖 Guides](#-guides)
* [🎯 Examples](#-examples)
* [🔗 Integration](#-integration)

***

## ✨ Quick Start

### Installation

```bash
npm install @hsuite/health
```

### Basic Setup

```typescript
import { HealthModule } from '@hsuite/health';
import { RedisClientOptions } from 'redis';

const redisOptions: RedisClientOptions = {
  socket: {
    host: 'localhost',
    port: 6379
  },
  password: 'your-redis-password',
  database: 0
};

@Module({
  imports: [
    HealthModule.forRoot(redisOptions)
  ]
})
export class AppModule {}
```

### Health Check Usage

```typescript
import { HealthService } from '@hsuite/health';

@Injectable()
export class MonitoringService {
  constructor(private healthService: HealthService) {}

  async checkSystemHealth() {
    const health = await this.healthService.check();
    console.log('System status:', health.status);
    return health;
  }
}
```

***

## 🏗️ Architecture

### Core Component Areas

#### 🏥 **System Health Monitoring**

* **Real-time Health Checks** - Comprehensive system health validation
* **Service Connectivity** - MongoDB, Redis, and microservice monitoring
* **Health Status Aggregation** - Multi-service health state management
* **Cached Responses** - Efficient health check performance optimization

#### 📊 **Resource Metrics Collection**

* **CPU Monitoring** - Real-time CPU utilization and multi-core tracking
* **Memory Management** - Memory usage, availability, and percentage tracking
* **Disk Space Monitoring** - Storage utilization and free space alerts
* **Network Metrics** - Input/output traffic monitoring and analysis

#### 🌐 **DAG Network Monitoring**

* **Network Health Tracking** - Specialized DAG network status monitoring
* **Event-Driven Updates** - Real-time threshold monitoring with events
* **Network Threshold Management** - Online/offline status detection
* **Performance Optimization** - Efficient network status collection

#### ⚡ **Performance Features**

* **Response Caching** - 1-second caching for optimal performance
* **Error Handling** - Comprehensive exception management
* **Resource Optimization** - Efficient OS utility integration
* **Multi-core Support** - Advanced CPU usage calculations

### Module Structure

```
src/
├── index.ts                           # Main entry point and exports
├── health.module.ts                   # Dynamic module with Redis configuration
├── health.controller.ts               # REST endpoints for health checks
├── health.service.ts                  # Core health monitoring service
├── interfaces/
│   └── infos.interface.ts             # Health metrics interfaces
├── models/
│   └── infos.model.ts                 # Health metrics model implementations
└── custom/
    └── dag.health.ts                  # DAG network health indicator
```

***

## 🔧 API Reference

### Core Health Endpoints

All health endpoints are publicly accessible with `@Public()` decorator.

#### Health Check Endpoint

**`GET /health/check`**

* **Purpose**: Comprehensive system health validation
* **Caching**: 1-second response caching
* **Monitors**: Redis, MongoDB, disk space, memory, DAG network, microservices

#### System Information Endpoint

**`GET /health/infos`**

* **Purpose**: Detailed system metrics and resource utilization
* **Caching**: 1-second response caching
* **Data**: Platform, CPU, memory, disk, network metrics

### Health Check Response Schema

```typescript
interface HealthCheckResult {
  status: 'ok' | 'error';
  info: {
    [key: string]: {
      status: 'up' | 'down';
    };
  };
  error: {
    [key: string]: {
      status: 'up' | 'down';
      message?: string;
    };
  };
  details: {
    [key: string]: {
      status: 'up' | 'down';
      [key: string]: any;
    };
  };
}
```

### System Information Schema

```typescript
interface IHealthInfos {
  platform: string;    // Operating system platform
  release: string;     // OS version
  machine: string;     // Hardware identifier
  arch: string;        // CPU architecture
  uptime: number;      // System uptime in seconds
  cpu: IHealthInfosCPU;
  memory: IHealthInfosMemory;
  drive: IHealthInfosDrive;
  network: IHealthInfosNetwork;
}
```

### Resource Metrics Tables

| CPU Metrics | Type   | Description                        |
| ----------- | ------ | ---------------------------------- |
| `usage`     | number | CPU utilization percentage (0-100) |
| `cpus`      | number | Number of CPU cores                |
| `speed`     | number | CPU clock frequency in MHz         |

| Memory Metrics      | Type   | Description             |
| ------------------- | ------ | ----------------------- |
| `totalMemMb`        | number | Total memory in MB      |
| `usedMemMb`         | number | Used memory in MB       |
| `freeMemMb`         | number | Free memory in MB       |
| `usedMemPercentage` | number | Memory usage percentage |
| `freeMemPercentage` | number | Free memory percentage  |

| Storage Metrics  | Type   | Description              |
| ---------------- | ------ | ------------------------ |
| `totalGb`        | string | Total storage in GB      |
| `usedGb`         | string | Used storage in GB       |
| `freeGb`         | string | Free storage in GB       |
| `usedPercentage` | string | Storage usage percentage |
| `freePercentage` | string | Free storage percentage  |

***

## 📖 Guides

### **Health Monitoring Setup Guide**

Complete guide to setting up comprehensive health monitoring for your application. Comprehensive setup instructions covering health indicator configuration, system monitoring, resource tracking, service health checks, and enterprise-grade health monitoring with real-time alerts and notifications.

### **DAG Network Monitoring Guide**

Learn how to implement and monitor DAG network health with event-driven updates. Advanced monitoring guide covering DAG network connectivity, consensus monitoring, network performance tracking, event-driven health updates, and enterprise-grade network monitoring with automated diagnostics.

### **Performance Optimization Guide**

Best practices for optimizing health monitoring performance and resource usage. Detailed optimization guide covering monitoring efficiency, resource utilization optimization, performance tuning, scalability improvements, and enterprise-grade performance optimization for health monitoring systems.

### **Alert and Threshold Management Guide**

Set up proactive monitoring with alerts and threshold-based notifications. Comprehensive guide for implementing alert systems, threshold configuration, notification management, escalation procedures, and enterprise-grade monitoring with automated alerting and incident response.

***

## 🎯 Examples

### Comprehensive Health Monitoring Service

```typescript
import { HealthService, DagHealthIndicator } from '@hsuite/health';
import { Injectable } from '@nestjs/common';

@Injectable()
export class SystemHealthMonitoringService {
  
  constructor(
    private healthService: HealthService,
    private dagHealth: DagHealthIndicator
  ) {}

  async performComprehensiveHealthCheck() {
    try {
      const healthResults = await this.healthService.check();
      const systemMetrics = await this.healthService.infos();
      const dagStatus = await this.dagHealth.isHealthy();

      const report = {
        timestamp: new Date().toISOString(),
        overallStatus: healthResults.status,
        systemHealth: {
          redis: healthResults.details.redis?.status || 'unknown',
          mongodb: healthResults.details.mongodb?.status || 'unknown',
          disk: healthResults.details.disk?.status || 'unknown',
          memory: healthResults.details.memory?.status || 'unknown'
        },
        dagNetwork: {
          status: dagStatus.dag?.status || 'unknown',
          network: dagStatus.dag?.network || 'disconnected'
        },
        resourceUtilization: {
          cpu: {
            usage: systemMetrics.cpu.usage,
            cores: systemMetrics.cpu.cpus,
            frequency: systemMetrics.cpu.speed
          },
          memory: {
            total: systemMetrics.memory.totalMemMb,
            used: systemMetrics.memory.usedMemMb,
            usagePercentage: systemMetrics.memory.usedMemPercentage,
            available: systemMetrics.memory.freeMemMb
          },
          storage: {
            total: systemMetrics.drive.totalGb,
            used: systemMetrics.drive.usedGb,
            usagePercentage: systemMetrics.drive.usedPercentage,
            available: systemMetrics.drive.freeGb
          },
          network: {
            bytesReceived: systemMetrics.network.inputBytes,
            bytesTransmitted: systemMetrics.network.outputBytes
          }
        },
        systemInfo: {
          platform: systemMetrics.platform,
          architecture: systemMetrics.arch,
          uptime: this.formatUptime(systemMetrics.uptime),
          release: systemMetrics.release
        }
      };

      // Check for critical conditions
      await this.evaluateSystemAlerts(report);

      return report;
    } catch (error) {
      throw new Error(`Health monitoring failed: ${error.message}`);
    }
  }

  async evaluateSystemAlerts(report: any) {
    const alerts = [];

    // CPU usage alerts
    if (report.resourceUtilization.cpu.usage > 90) {
      alerts.push({
        severity: 'critical',
        component: 'CPU',
        message: `High CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
        threshold: 90,
        action: 'Scale resources or investigate high load processes'
      });
    } else if (report.resourceUtilization.cpu.usage > 75) {
      alerts.push({
        severity: 'warning',
        component: 'CPU',
        message: `Elevated CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
        threshold: 75,
        action: 'Monitor CPU usage trends'
      });
    }

    // Memory usage alerts
    if (report.resourceUtilization.memory.usagePercentage > 95) {
      alerts.push({
        severity: 'critical',
        component: 'Memory',
        message: `Critical memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
        threshold: 95,
        action: 'Immediate memory cleanup or scaling required'
      });
    } else if (report.resourceUtilization.memory.usagePercentage > 85) {
      alerts.push({
        severity: 'warning',
        component: 'Memory',
        message: `High memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
        threshold: 85,
        action: 'Consider memory optimization or scaling'
      });
    }

    // Storage alerts
    const storageUsage = parseFloat(report.resourceUtilization.storage.usagePercentage);
    if (storageUsage > 95) {
      alerts.push({
        severity: 'critical',
        component: 'Storage',
        message: `Critical disk space: ${storageUsage}% used`,
        threshold: 95,
        action: 'Immediate cleanup or storage expansion required'
      });
    } else if (storageUsage > 85) {
      alerts.push({
        severity: 'warning',
        component: 'Storage',
        message: `Low disk space: ${storageUsage}% used`,
        threshold: 85,
        action: 'Plan for storage cleanup or expansion'
      });
    }

    // Service health alerts
    Object.entries(report.systemHealth).forEach(([service, status]) => {
      if (status !== 'up') {
        alerts.push({
          severity: 'critical',
          component: service.toUpperCase(),
          message: `Service ${service} is ${status}`,
          action: `Investigate ${service} connectivity and restart if necessary`
        });
      }
    });

    // DAG network alerts
    if (report.dagNetwork.status !== 'up') {
      alerts.push({
        severity: 'critical',
        component: 'DAG Network',
        message: `DAG network is ${report.dagNetwork.status}`,
        action: 'Check network connectivity and DAG node status'
      });
    }

    if (alerts.length > 0) {
      await this.processAlerts(alerts);
    }

    return alerts;
  }

  private async processAlerts(alerts: any[]) {
    // Process alerts - send notifications, log, etc.
    for (const alert of alerts) {
      console.warn(`[${alert.severity.toUpperCase()}] ${alert.component}: ${alert.message}`);
      console.warn(`Action: ${alert.action}`);
      
      // In production, implement:
      // - Send notifications (email, Slack, etc.)
      // - Log to monitoring systems
      // - Trigger automated responses
      // - Update dashboards
    }
  }

  private formatUptime(seconds: number): string {
    const days = Math.floor(seconds / 86400);
    const hours = Math.floor((seconds % 86400) / 3600);
    const minutes = Math.floor((seconds % 3600) / 60);
    return `${days}d ${hours}h ${minutes}m`;
  }

  async getHealthSummary() {
    try {
      const health = await this.healthService.check();
      const metrics = await this.healthService.infos();

      return {
        status: health.status,
        uptime: this.formatUptime(metrics.uptime),
        resources: {
          cpu: `${metrics.cpu.usage}% (${metrics.cpu.cpus} cores)`,
          memory: `${metrics.memory.usedMemPercentage}% used (${metrics.memory.freeMemMb}MB free)`,
          storage: `${metrics.drive.usedPercentage}% used (${metrics.drive.freeGb}GB free)`
        },
        services: Object.keys(health.details).reduce((acc, key) => {
          acc[key] = health.details[key].status;
          return acc;
        }, {}),
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      throw new Error(`Health summary generation failed: ${error.message}`);
    }
  }
}
```

### DAG Network Monitoring Service

```typescript
import { DagHealthIndicator } from '@hsuite/health';
import { Injectable, OnModuleInit } from '@nestjs/common';
import { EventEmitter2 } from '@nestjs/event-emitter';

@Injectable()
export class DAGNetworkMonitoringService implements OnModuleInit {
  
  private networkStatus: string = 'unknown';
  private lastStatusChange: Date = new Date();
  private statusHistory: Array<{ status: string; timestamp: Date }> = [];

  constructor(
    private dagHealth: DagHealthIndicator,
    private eventEmitter: EventEmitter2
  ) {}

  onModuleInit() {
    // Listen for DAG network events
    this.eventEmitter.on('smart_node.monitors.network_threshold_online', this.handleNetworkOnline.bind(this));
    this.eventEmitter.on('smart_node.monitors.network_threshold_offline', this.handleNetworkOffline.bind(this));

    // Start periodic monitoring
    this.startPeriodicMonitoring();
  }

  async checkDAGNetworkHealth() {
    try {
      const health = await this.dagHealth.isHealthy();
      
      const networkStatus = {
        status: health.dag?.status || 'unknown',
        network: health.dag?.network || 'disconnected',
        timestamp: new Date().toISOString(),
        isHealthy: health.dag?.status === 'up',
        details: health.dag
      };

      // Update internal status tracking
      if (networkStatus.status !== this.networkStatus) {
        this.handleStatusChange(networkStatus.status);
      }

      return networkStatus;
    } catch (error) {
      console.error('DAG network health check failed:', error);
      
      if (error.causes) {
        console.error('Health check causes:', error.causes);
      }

      return {
        status: 'error',
        network: 'error',
        timestamp: new Date().toISOString(),
        isHealthy: false,
        error: error.message,
        causes: error.causes
      };
    }
  }

  private handleNetworkOnline(event: any) {
    console.log('DAG Network Online Event:', event);
    this.handleStatusChange('up');
    
    // Emit custom application event
    this.eventEmitter.emit('dag.network.online', {
      timestamp: new Date(),
      previousStatus: this.networkStatus,
      event: event
    });
  }

  private handleNetworkOffline(event: any) {
    console.log('DAG Network Offline Event:', event);
    this.handleStatusChange('down');
    
    // Emit custom application event
    this.eventEmitter.emit('dag.network.offline', {
      timestamp: new Date(),
      previousStatus: this.networkStatus,
      event: event
    });
  }

  private handleStatusChange(newStatus: string) {
    const previousStatus = this.networkStatus;
    this.networkStatus = newStatus;
    this.lastStatusChange = new Date();

    // Add to status history
    this.statusHistory.push({
      status: newStatus,
      timestamp: this.lastStatusChange
    });

    // Keep only last 100 status changes
    if (this.statusHistory.length > 100) {
      this.statusHistory = this.statusHistory.slice(-100);
    }

    console.log(`DAG Network status changed: ${previousStatus} -> ${newStatus}`);
    
    // Trigger alerts for status changes
    if (newStatus === 'down') {
      this.triggerNetworkDownAlert(previousStatus);
    } else if (newStatus === 'up' && previousStatus === 'down') {
      this.triggerNetworkRecoveryAlert();
    }
  }

  private triggerNetworkDownAlert(previousStatus: string) {
    const alert = {
      severity: 'critical',
      component: 'DAG Network',
      message: 'DAG network has gone offline',
      previousStatus,
      currentStatus: this.networkStatus,
      timestamp: this.lastStatusChange,
      action: 'Investigate network connectivity and DAG node status'
    };

    console.error('DAG Network Alert:', alert);
    
    // In production, implement:
    // - Send immediate alerts
    // - Log to monitoring systems
    // - Trigger recovery procedures
  }

  private triggerNetworkRecoveryAlert() {
    const downtime = Date.now() - this.lastStatusChange.getTime();
    
    const alert = {
      severity: 'info',
      component: 'DAG Network',
      message: 'DAG network has recovered',
      currentStatus: this.networkStatus,
      timestamp: this.lastStatusChange,
      downtime: `${Math.round(downtime / 1000)} seconds`,
      action: 'Verify network stability and check for missed transactions'
    };

    console.log('DAG Network Recovery:', alert);
  }

  private startPeriodicMonitoring() {
    // Check DAG network health every 30 seconds
    setInterval(async () => {
      try {
        await this.checkDAGNetworkHealth();
      } catch (error) {
        console.error('Periodic DAG health check failed:', error);
      }
    }, 30000);
  }

  getNetworkStatusHistory() {
    return {
      currentStatus: this.networkStatus,
      lastStatusChange: this.lastStatusChange,
      statusHistory: this.statusHistory,
      uptime: this.calculateUptime(),
      downtime: this.calculateDowntime()
    };
  }

  private calculateUptime(): { total: number; percentage: number } {
    const now = Date.now();
    const oneDayAgo = now - (24 * 60 * 60 * 1000);
    
    const recentHistory = this.statusHistory.filter(
      entry => entry.timestamp.getTime() > oneDayAgo
    );

    let upTime = 0;
    for (let i = 0; i < recentHistory.length - 1; i++) {
      const current = recentHistory[i];
      const next = recentHistory[i + 1];
      
      if (current.status === 'up') {
        upTime += next.timestamp.getTime() - current.timestamp.getTime();
      }
    }

    // Add current status time if up
    if (recentHistory.length > 0 && recentHistory[recentHistory.length - 1].status === 'up') {
      upTime += now - recentHistory[recentHistory.length - 1].timestamp.getTime();
    }

    const totalTime = 24 * 60 * 60 * 1000; // 24 hours in ms
    const percentage = (upTime / totalTime) * 100;

    return {
      total: Math.round(upTime / 1000), // seconds
      percentage: Math.round(percentage * 100) / 100
    };
  }

  private calculateDowntime(): { total: number; percentage: number } {
    const uptime = this.calculateUptime();
    const totalDaySeconds = 24 * 60 * 60;
    
    return {
      total: totalDaySeconds - uptime.total,
      percentage: Math.round((100 - uptime.percentage) * 100) / 100
    };
  }
}
```

### Advanced Redis Configuration Service

```typescript
import { HealthModule } from '@hsuite/health';
import { Injectable } from '@nestjs/common';
import { RedisClientOptions } from 'redis';

@Injectable()
export class HealthModuleConfigurationService {
  
  createProductionRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: process.env.REDIS_HOST || 'redis-cluster.production.com',
        port: parseInt(process.env.REDIS_PORT || '6379'),
        tls: process.env.REDIS_TLS === 'true',
        connectTimeout: 10000,
        commandTimeout: 5000,
        lazyConnect: true
      },
      password: process.env.REDIS_PASSWORD,
      database: parseInt(process.env.REDIS_DATABASE || '0'),
      enableReadyCheck: true,
      maxRetriesPerRequest: 3,
      retryDelayOnFailover: 100,
      enableOfflineQueue: false,
      pingInterval: 30000,
      // Health check specific settings
      keepAlive: true,
      family: 4
    };
  }

  createDevelopmentRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: 'localhost',
        port: 6379,
        connectTimeout: 5000
      },
      database: 0,
      enableReadyCheck: false,
      maxRetriesPerRequest: 1
    };
  }

  createHighAvailabilityRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: process.env.REDIS_SENTINEL_HOST || 'redis-sentinel.internal',
        port: parseInt(process.env.REDIS_SENTINEL_PORT || '26379'),
        connectTimeout: 5000,
        commandTimeout: 3000
      },
      password: process.env.REDIS_PASSWORD,
      enableReadyCheck: true,
      maxRetriesPerRequest: 5,
      retryDelayOnFailover: 100,
      enableOfflineQueue: false,
      // Sentinel configuration
      sentinels: [
        { host: 'sentinel1.internal', port: 26379 },
        { host: 'sentinel2.internal', port: 26379 },
        { host: 'sentinel3.internal', port: 26379 }
      ],
      name: 'mymaster', // Redis master name in Sentinel
      // Failover settings
      reconnectOnError: (err: Error) => {
        const targetError = 'READONLY';
        return err.message.includes(targetError);
      }
    };
  }

  async validateRedisConfiguration(config: RedisClientOptions): Promise<boolean> {
    try {
      const Redis = require('ioredis');
      const redis = new Redis(config);
      
      // Test basic operations
      await redis.ping();
      await redis.set('health:test', 'validation', 'EX', 10);
      const result = await redis.get('health:test');
      await redis.del('health:test');
      
      await redis.quit();
      
      return result === 'validation';
    } catch (error) {
      console.error('Redis configuration validation failed:', error);
      return false;
    }
  }

  createConfigurationBasedOnEnvironment(): RedisClientOptions {
    const environment = process.env.NODE_ENV || 'development';
    
    switch (environment) {
      case 'production':
        return this.createProductionRedisConfig();
      case 'staging':
        return this.createHighAvailabilityRedisConfig();
      default:
        return this.createDevelopmentRedisConfig();
    }
  }
}

// Usage in app module
@Module({
  imports: [
    HealthModule.forRootAsync({
      imports: [ConfigModule],
      useFactory: async (
        configService: ConfigService,
        healthConfig: HealthModuleConfigurationService
      ) => {
        const redisConfig = healthConfig.createConfigurationBasedOnEnvironment();
        
        // Validate configuration before using
        const isValid = await healthConfig.validateRedisConfiguration(redisConfig);
        
        if (!isValid) {
          console.warn('Redis configuration validation failed, using fallback');
          return healthConfig.createDevelopmentRedisConfig();
        }
        
        return redisConfig;
      },
      inject: [ConfigService, HealthModuleConfigurationService]
    })
  ],
  providers: [HealthModuleConfigurationService]
})
export class AppModule {}
```

### Health Metrics Analytics Service

```typescript
import { HealthService } from '@hsuite/health';
import { Injectable } from '@nestjs/common';

@Injectable()
export class HealthMetricsAnalyticsService {
  
  private metricsHistory: Array<{ timestamp: Date; metrics: any }> = [];
  private readonly maxHistorySize = 1000;

  constructor(private healthService: HealthService) {
    // Start collecting metrics every minute
    this.startMetricsCollection();
  }

  async analyzeSystemPerformance(timeRange: { start: Date; end: Date }) {
    try {
      const relevantMetrics = this.metricsHistory.filter(
        entry => entry.timestamp >= timeRange.start && entry.timestamp <= timeRange.end
      );

      if (relevantMetrics.length === 0) {
        throw new Error('No metrics available for the specified time range');
      }

      const analysis = {
        timeRange,
        dataPoints: relevantMetrics.length,
        cpu: this.analyzeCPUMetrics(relevantMetrics),
        memory: this.analyzeMemoryMetrics(relevantMetrics),
        storage: this.analyzeStorageMetrics(relevantMetrics),
        network: this.analyzeNetworkMetrics(relevantMetrics),
        trends: this.calculateTrends(relevantMetrics),
        recommendations: []
      };

      // Generate recommendations
      analysis.recommendations = this.generateRecommendations(analysis);

      return analysis;
    } catch (error) {
      throw new Error(`Performance analysis failed: ${error.message}`);
    }
  }

  private startMetricsCollection() {
    setInterval(async () => {
      try {
        const metrics = await this.healthService.infos();
        
        this.metricsHistory.push({
          timestamp: new Date(),
          metrics: metrics
        });

        // Maintain history size
        if (this.metricsHistory.length > this.maxHistorySize) {
          this.metricsHistory = this.metricsHistory.slice(-this.maxHistorySize);
        }
      } catch (error) {
        console.error('Metrics collection failed:', error);
      }
    }, 60000); // Collect every minute
  }

  private analyzeCPUMetrics(metricsData: any[]) {
    const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
    
    return {
      average: this.calculateAverage(cpuValues),
      min: Math.min(...cpuValues),
      max: Math.max(...cpuValues),
      median: this.calculateMedian(cpuValues),
      percentile95: this.calculatePercentile(cpuValues, 95),
      spikes: cpuValues.filter(value => value > 90).length,
      trend: this.calculateTrend(cpuValues)
    };
  }

  private analyzeMemoryMetrics(metricsData: any[]) {
    const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
    
    return {
      average: this.calculateAverage(memoryValues),
      min: Math.min(...memoryValues),
      max: Math.max(...memoryValues),
      median: this.calculateMedian(memoryValues),
      percentile95: this.calculatePercentile(memoryValues, 95),
      criticalEvents: memoryValues.filter(value => value > 95).length,
      trend: this.calculateTrend(memoryValues)
    };
  }

  private analyzeStorageMetrics(metricsData: any[]) {
    const storageValues = metricsData.map(entry => parseFloat(entry.metrics.drive.usedPercentage));
    
    return {
      average: this.calculateAverage(storageValues),
      min: Math.min(...storageValues),
      max: Math.max(...storageValues),
      median: this.calculateMedian(storageValues),
      growth: this.calculateStorageGrowth(metricsData),
      trend: this.calculateTrend(storageValues)
    };
  }

  private analyzeNetworkMetrics(metricsData: any[]) {
    const inputBytes = metricsData.map(entry => entry.metrics.network.inputBytes);
    const outputBytes = metricsData.map(entry => entry.metrics.network.outputBytes);
    
    return {
      input: {
        average: this.calculateAverage(inputBytes),
        peak: Math.max(...inputBytes),
        trend: this.calculateTrend(inputBytes)
      },
      output: {
        average: this.calculateAverage(outputBytes),
        peak: Math.max(...outputBytes),
        trend: this.calculateTrend(outputBytes)
      },
      totalTraffic: {
        average: this.calculateAverage(inputBytes.map((val, i) => val + outputBytes[i])),
        peak: Math.max(...inputBytes.map((val, i) => val + outputBytes[i]))
      }
    };
  }

  private calculateTrends(metricsData: any[]) {
    const timePoints = metricsData.map(entry => entry.timestamp.getTime());
    const startTime = Math.min(...timePoints);
    const endTime = Math.max(...timePoints);
    const duration = endTime - startTime;

    return {
      analysisWindowMinutes: Math.round(duration / (1000 * 60)),
      dataPointsCollected: metricsData.length,
      averageInterval: Math.round(duration / metricsData.length / 1000), // seconds
      systemStability: this.calculateSystemStability(metricsData)
    };
  }

  private generateRecommendations(analysis: any): string[] {
    const recommendations = [];

    // CPU recommendations
    if (analysis.cpu.average > 80) {
      recommendations.push('Consider CPU optimization or scaling - average usage is high');
    }
    if (analysis.cpu.spikes > analysis.dataPoints * 0.1) {
      recommendations.push('Investigate CPU spikes - frequent high usage detected');
    }

    // Memory recommendations
    if (analysis.memory.average > 85) {
      recommendations.push('Memory usage is consistently high - consider optimization');
    }
    if (analysis.memory.criticalEvents > 0) {
      recommendations.push('Critical memory events detected - immediate attention required');
    }

    // Storage recommendations
    if (analysis.storage.average > 85) {
      recommendations.push('Storage usage is high - plan for cleanup or expansion');
    }
    if (analysis.storage.trend > 0.1) {
      recommendations.push('Storage usage is growing rapidly - monitor closely');
    }

    // Network recommendations
    if (analysis.network.totalTraffic.peak > analysis.network.totalTraffic.average * 10) {
      recommendations.push('Network traffic spikes detected - investigate unusual activity');
    }

    return recommendations;
  }

  // Utility calculation methods
  private calculateAverage(values: number[]): number {
    return values.reduce((sum, val) => sum + val, 0) / values.length;
  }

  private calculateMedian(values: number[]): number {
    const sorted = [...values].sort((a, b) => a - b);
    const mid = Math.floor(sorted.length / 2);
    return sorted.length % 2 === 0 
      ? (sorted[mid - 1] + sorted[mid]) / 2 
      : sorted[mid];
  }

  private calculatePercentile(values: number[], percentile: number): number {
    const sorted = [...values].sort((a, b) => a - b);
    const index = Math.ceil((percentile / 100) * sorted.length) - 1;
    return sorted[index];
  }

  private calculateTrend(values: number[]): number {
    if (values.length < 2) return 0;
    
    // Simple linear trend calculation
    const firstHalf = values.slice(0, Math.floor(values.length / 2));
    const secondHalf = values.slice(Math.floor(values.length / 2));
    
    const firstAvg = this.calculateAverage(firstHalf);
    const secondAvg = this.calculateAverage(secondHalf);
    
    return (secondAvg - firstAvg) / firstAvg;
  }

  private calculateStorageGrowth(metricsData: any[]): number {
    if (metricsData.length < 2) return 0;
    
    const first = parseFloat(metricsData[0].metrics.drive.usedGb);
    const last = parseFloat(metricsData[metricsData.length - 1].metrics.drive.usedGb);
    
    return last - first; // GB growth
  }

  private calculateSystemStability(metricsData: any[]): string {
    // Calculate coefficient of variation for CPU and memory
    const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
    const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
    
    const cpuStdDev = this.calculateStandardDeviation(cpuValues);
    const memoryStdDev = this.calculateStandardDeviation(memoryValues);
    
    const cpuCV = cpuStdDev / this.calculateAverage(cpuValues);
    const memoryCV = memoryStdDev / this.calculateAverage(memoryValues);
    
    const avgCV = (cpuCV + memoryCV) / 2;
    
    if (avgCV < 0.1) return 'Excellent';
    if (avgCV < 0.2) return 'Good';
    if (avgCV < 0.3) return 'Fair';
    return 'Poor';
  }

  private calculateStandardDeviation(values: number[]): number {
    const avg = this.calculateAverage(values);
    const squaredDiffs = values.map(val => Math.pow(val - avg, 2));
    const avgSquaredDiff = this.calculateAverage(squaredDiffs);
    return Math.sqrt(avgSquaredDiff);
  }

  getMetricsHistory() {
    return {
      totalDataPoints: this.metricsHistory.length,
      oldestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[0].timestamp : null,
      newestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[this.metricsHistory.length - 1].timestamp : null,
      memoryUsage: `${this.metricsHistory.length} entries (max: ${this.maxHistorySize})`
    };
  }
}
```

***

## 🔗 Integration

### Required Dependencies

```json
{
  "@nestjs/common": "^10.4.2",
  "@nestjs/core": "^10.4.2",
  "@hsuite/nestjs-swagger": "^1.0.3",
  "@compodoc/compodoc": "^1.1.23"
}
```

### Module Integration

```typescript
import { Module } from '@nestjs/common';
import { HealthModule, HealthService, DagHealthIndicator } from '@hsuite/health';

@Module({
  imports: [
    HealthModule.forRoot({
      socket: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379')
      },
      password: process.env.REDIS_PASSWORD,
      database: parseInt(process.env.REDIS_DATABASE || '0')
    })
  ],
  providers: [
    SystemHealthMonitoringService,
    DAGNetworkMonitoringService,
    HealthModuleConfigurationService,
    HealthMetricsAnalyticsService
  ],
  exports: [
    HealthService,
    DagHealthIndicator,
    SystemHealthMonitoringService,
    DAGNetworkMonitoringService,
    HealthMetricsAnalyticsService
  ]
})
export class HealthMonitoringModule {}
```

### Documentation Generation

```bash
# Generate comprehensive documentation
npm run compodoc

# Generate documentation with coverage report
npm run compodoc:coverage
```

### Environment Configuration

```bash
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your-secure-password
REDIS_DATABASE=0
REDIS_TLS=false

# Health Monitoring Settings
HEALTH_CHECK_INTERVAL=30000
METRICS_COLLECTION_INTERVAL=60000
ALERT_THRESHOLD_CPU=90
ALERT_THRESHOLD_MEMORY=85
ALERT_THRESHOLD_STORAGE=90

# DAG Network Settings
DAG_NETWORK_MONITORING=true
DAG_STATUS_CACHE_TTL=1000
```

### Integration with HSuite Ecosystem

```typescript
// Complete integration with other HSuite modules
import { HealthModule, HealthService } from '@hsuite/health';
import { AuthModule } from '@hsuite/auth';
import { SmartNetworkModule } from '@hsuite/smart-network';

@Module({
  imports: [
    AuthModule,
    SmartNetworkModule,
    HealthModule.forRootAsync({
      imports: [ConfigModule],
      useFactory: async (configService: ConfigService) => ({
        socket: {
          host: configService.get('REDIS_HOST', 'localhost'),
          port: configService.get('REDIS_PORT', 6379)
        },
        password: configService.get('REDIS_PASSWORD'),
        database: configService.get('REDIS_DATABASE', 0)
      }),
      inject: [ConfigService]
    })
  ]
})
export class HealthEcosystemModule {}

@Injectable()
export class IntegratedHealthService {
  constructor(
    private healthService: HealthService,
    private authService: AuthService,
    private networkService: SmartNetworkService
  ) {}

  async getComprehensiveSystemStatus() {
    // 1. Get basic health status
    const health = await this.healthService.check();
    
    // 2. Get system metrics
    const metrics = await this.healthService.infos();
    
    // 3. Check authentication service health
    const authHealth = await this.authService.getHealthStatus();
    
    // 4. Check network service health
    const networkHealth = await this.networkService.getNetworkStatus();

    return {
      timestamp: new Date().toISOString(),
      overallStatus: health.status,
      components: {
        system: health.details,
        authentication: authHealth,
        network: networkHealth
      },
      metrics: {
        cpu: metrics.cpu,
        memory: metrics.memory,
        storage: metrics.drive,
        network: metrics.network,
        uptime: metrics.uptime
      },
      platform: {
        os: metrics.platform,
        architecture: metrics.arch,
        release: metrics.release
      }
    };
  }
}
```

## Performance Considerations

### 📊 **Caching Strategy**

* **1-Second Caching** - Health checks and metrics responses cached for optimal performance
* **Memory Management** - Efficient resource metrics collection with OS utilities
* **Event-Driven Updates** - DAG network status uses events to reduce polling overhead

### 🔧 **Optimization Features**

* **Multi-core CPU Calculations** - Advanced CPU usage calculations for multi-core systems
* **Efficient Collection** - Optimized OS utility integration for resource monitoring
* **Connection Pooling** - Redis connection optimization for health checks

### 🛡️ **Error Handling**

* **Comprehensive Exception Management** - Proper error types and detailed messages
* **Graceful Degradation** - Fallback mechanisms for failed health checks
* **Service Isolation** - Individual service failures don't affect overall monitoring

***

**🏥 Enterprise Health Monitoring**: Comprehensive system diagnostics with real-time resource tracking and specialized DAG network monitoring.

**📊 Advanced Analytics**: Performance analysis, trend calculation, and intelligent recommendations for system optimization.

**🌐 DAG Network Integration**: Event-driven network monitoring with threshold management and automated status updates.

***

<p align="center">Built with ❤️ by the HSuite Team<br>Copyright © 2025 HSuite. All rights reserved.</p>
