Skip to main content

Module 7: Monitoring and Logging

Master SLT monitoring tools, log analysis, and proactive alerting to ensure reliable replication.

1. Monitoring Dashboard (LTRC)

Real-Time Dashboard

Transaction: LTRC

┌─────────────────────────────────────────────┐
│ SLT Replication Dashboard │
├─────────────────────────────────────────────┤
│ Overall Status: ● Active (Green) │
│ MT_IDs Active: 3 │
│ Tables Replicating: 125 │
│ Throughput: 2,345 records/sec │
│ Avg Latency: 1.2 seconds │
│ Errors (24h): 5 │
│ │
│ Top 5 Active Tables: │
│ VBAP ██████████░░ 65% (1,523 rec/s) │
│ BSEG ████████░░░░ 40% (940 rec/s) │
│ MSEG ██████░░░░░░ 30% (705 rec/s) │
│ EKPO ████░░░░░░░░ 20% (470 rec/s) │
│ MARC ██░░░░░░░░░░ 10% (235 rec/s) │
└─────────────────────────────────────────────┘

2. Key Metrics to Monitor

Health Indicators

MetricCheckGreenYellowRed
LatencyAvg response time< 2s2-10s> 10s
ThroughputRecords/sec> 1000500-1000< 500
Error RateErrors/total< 0.1%0.1-1%> 1%
Log Table SizeGB used< 55-10> 10
Job FailuresFailed jobs01-5> 5
CPU UsageSLT server< 70%70-85%> 85%

3. Log Files and Analysis

SLT Log Locations

Application Logs:

Location: /usr/sap/SLT/D00/work/

Key Files:
├── dev_w0 - Work process 0 log
├── dev_disp - Dispatcher log
├── dev_rfc* - RFC connection logs
└── syslog - System messages

Replication Logs:
├── /DMIS/LOG_* - Logging tables (database)
└── SM37 jobs - Background job logs

Log Analysis Queries

-- Find slow replication tables
SELECT
TABLE_NAME,
AVG(REPLICATION_TIME_MS) as AVG_TIME,
MAX(REPLICATION_TIME_MS) as MAX_TIME,
COUNT(*) as RECORD_COUNT
FROM /DMIS/REPLICATION_STATS
WHERE TIMESTAMP >= ADD_DAYS(CURRENT_DATE, -1)
GROUP BY TABLE_NAME
HAVING AVG(REPLICATION_TIME_MS) > 1000
ORDER BY AVG_TIME DESC;

-- Identify error patterns
SELECT
ERROR_TYPE,
TABLE_NAME,
COUNT(*) as ERROR_COUNT,
MAX(TIMESTAMP) as LAST_OCCURRENCE
FROM /DMIS/ERROR_LOG
WHERE TIMESTAMP >= ADD_DAYS(CURRENT_DATE, -7)
GROUP BY ERROR_TYPE, TABLE_NAME
ORDER BY ERROR_COUNT DESC;

4. Performance Monitoring

System Metrics (ST06)

Transaction: ST06 (Operating System Monitor)

CPU Usage:
├── Total: 68% (healthy)
├── User: 42%
├── System: 26%
└── Idle: 32%

Memory:
├── Total: 64 GB
├── Used: 48 GB (75%)
└── Free: 16 GB

Disk I/O:
├── Read: 450 MB/s
└── Write: 280 MB/s

Database Performance (DB02)

Transaction: DB02

Tablespace Usage:
├── SLTLOG: 45% (9.2 GB / 20 GB)
├── SLTDATA: 32% (16 GB / 50 GB)
└── SLTTEMP: 15% (3 GB / 20 GB)

Top SQL Statements:
1. SELECT FROM /DMIS/LOG_VBAP (28% DB time)
2. INSERT INTO SLTREPL.VBAP (15% DB time)
3. DELETE FROM /DMIS/LOG_BSEG (12% DB time)

5. Alerting and Notifications

Configure Email Alerts

Transaction: LTRC → Settings → Notifications

Alert Conditions:
☑ Latency > 30 seconds
☑ Error rate > 1%
☑ Job failure
☑ Logging table > 10 GB
☑ MT_ID status changed to Error

Recipients:
- slt-admin@company.com
- dba-team@company.com

Frequency: Immediate (real-time)

SMS/Integration Alerts

" Custom alert integration
FUNCTION Z_SLT_SEND_ALERT.
IMPORTING
iv_mt_id TYPE string
iv_severity TYPE string
iv_message TYPE string.

" Send to monitoring system (Nagios/Splunk/etc)
CALL FUNCTION 'HTTP_POST'
EXPORTING
uri = 'https://monitoring.company.com/api/alert'
data = |{ "mtid": "{iv_mt_id}", "severity": "{iv_severity}", "message": "{iv_message}" }|.
ENDFUNCTION.

6. Trend Analysis

Historical Performance

-- Weekly throughput trends
SELECT
TO_VARCHAR(DATE_TRUNC('DAY', TIMESTAMP), 'YYYY-MM-DD') as DAY,
SUM(RECORD_COUNT) as TOTAL_RECORDS,
AVG(LATENCY_MS) as AVG_LATENCY
FROM /DMIS/STATISTICS
WHERE TIMESTAMP >= ADD_DAYS(CURRENT_DATE, -30)
GROUP BY DATE_TRUNC('DAY', TIMESTAMP)
ORDER BY DAY;

-- Result visualization:
Day Records Latency
2026-01-01 2,345,678 1.2s
2026-01-02 2,456,789 1.3s
2026-01-03 2,123,456 1.1s
...

Capacity Planning

Growth Analysis (Last 90 days):
- Data volume: +15% per month
- Tables added: +5 per month
- Throughput required: +12% per month

Projected Capacity (6 months):
- Data volume: 150 GB → 225 GB
- Throughput: 2,000 rec/s → 2,800 rec/s
- Action: Plan hardware upgrade Q2 2026

7. Reporting

Daily Health Report

Automated Daily Report (Email)

Subject: SLT Daily Health Report - 2026-01-21

Summary:
✅ Overall Status: Healthy
✅ Replication Active: 125/125 tables
✅ Avg Latency: 1.2 seconds
⚠️ Warnings: 2
❌ Errors: 5

Details:
- Total records replicated: 5,234,567
- Peak throughput: 3,456 rec/s (at 14:30)
- Lowest throughput: 890 rec/s (at 03:15)

Warnings:
1. Table VBAP: Latency spike to 15s at 08:45
2. Logging table BSEG: Size 8.5 GB (approaching limit)

Errors:
1. Table MARC: 3 FK violations (resolved)
2. Table KNA1: 2 target locks (retried successfully)

Actions Required:
- Review VBAP performance during business hours
- Schedule BSEG logging table cleanup

[View Full Report]

8. Custom Monitoring Scripts

Shell Script Example

#!/bin/bash
# SLT Health Check Script

# Check replication status
status=$(hdbsql -n hanaserver:30015 -u SLTREPL -p $PASSWORD \
"SELECT COUNT(*) FROM /DMIS/LOG_MARA WHERE PROCESSED = ''")

if [ $status -gt 10000 ]; then
echo "WARNING: $status pending records in MARA"
# Send alert
curl -X POST https://monitoring/api/alert \
-d '{"service":"SLT","status":"warning","message":"High pending count"}'
fi

# Check job status
jobs=$(sm37 | grep DMIS | grep "Cancelled|Error" | wc -l)
if [ $jobs -gt 0 ]; then
echo "ERROR: $jobs failed replication jobs"
fi

Python Monitoring Client

import requests
from datetime import datetime

class SLTMonitor:
def __init__(self, hana_host, user, password):
self.conn = connect(hana_host, user, password)

def check_latency(self, threshold_seconds=10):
query = """
SELECT TABLE_NAME,
TIMESTAMPDIFF(SECOND, MIN(TIMESTAMP), CURRENT_TIMESTAMP) as LATENCY
FROM /DMIS/LOG_*
WHERE PROCESSED = ''
GROUP BY TABLE_NAME
HAVING LATENCY > ?
"""
results = self.conn.execute(query, (threshold_seconds,))

for table, latency in results:
self.send_alert(f"High latency on {table}: {latency}s")

def send_alert(self, message):
requests.post('https://monitoring/api/alert',
json={'message': message, 'timestamp': datetime.now()})

9. Best Practices

Monitoring Checklist

Daily:

  • ✅ Check dashboard for red/yellow indicators
  • ✅ Review overnight job logs
  • ✅ Verify no replication stopped
  • ✅ Check error queue

Weekly:

  • ✅ Analyze performance trends
  • ✅ Review logging table growth
  • ✅ Check CPU/memory utilization
  • ✅ Test alert notifications

Monthly:

  • ✅ Capacity planning review
  • ✅ Performance tuning assessment
  • ✅ Update monitoring thresholds
  • ✅ Disaster recovery test

Summary

✅ Real-time dashboard monitoring (LTRC)
✅ Key health metrics and thresholds
✅ Log file locations and analysis
✅ Performance monitoring (ST06, DB02)
✅ Alert configuration and notifications
✅ Trend analysis and capacity planning
✅ Custom monitoring scripts
✅ Best practices for proactive monitoring

Next: Module 8 - Error Handling & Recovery