Module 18: Troubleshooting Guide
Comprehensive troubleshooting techniques for resolving common and complex SLT issues.
1. Diagnostic Framework
graph TD
A[Issue Reported] --> B{Symptom Category}
B -->|Performance| C[Check System Resources]
B -->|Error| D[Review Error Logs]
B -->|Data Quality| E[Validate Data]
B -->|Connectivity| F[Test Network]
C --> G[Analyze and Fix]
D --> G
E --> G
F --> G
G --> H{Resolved?}
H -->|No| I[Escalate to SAP]
H -->|Yes| J[Document Solution]
2. Common Issues and Solutions
Issue 1: Replication Not Starting
Symptoms:
Transaction: LTRC
Status: ⚠ Replication not active
Error: MT_ID cannot be started
Diagnosis:
-- Check MT_ID status
SELECT * FROM /DMIS/MT_CONFIG
WHERE MT_ID = 'MT_ID_PROD_01';
-- Check for active errors
SELECT * FROM /DMIS/LOG_ERROR
WHERE MT_ID = 'MT_ID_PROD_01'
ORDER BY TIMESTAMP DESC
LIMIT 10;
Common Causes and Solutions:
Cause 1: RFC Connection Failed
Check: Transaction SM59
Error: "Connection timed out"
Solution:
1. Verify network connectivity:
ping source-system.company.com
2. Test RFC connection:
SM59 → Select RFC → Connection Test
3. Check firewall rules:
- Port 3300-3399 (SAP Gateway)
- Port 3200-3299 (Dispatcher)
4. Verify RFC user credentials:
- User: RFC_USER
- Password: Check not expired
- Authorizations: S_RFC (all)
Cause 2: Target HANA Not Reachable
Check: Test HANA connection
Error: "Unable to connect to database"
Solution:
1. Verify HANA is running:
HDB info
2. Test connection:
hdbsql -n hana-server:30015 -u SLT_USER -p password
3. Check HANA instance:
Transaction: DBACOCKPIT → Check System
4. Verify schema exists:
SELECT SCHEMA_NAME FROM SCHEMAS WHERE SCHEMA_NAME = 'SLTREPL';
Cause 3: Insufficient Resources
Check: Transaction SM50 (Process Overview)
Error: "No work processes available"
Solution:
1. Increase parallel jobs:
LTRC → Advanced Settings → Parallel Jobs: 8 → 16
2. Check memory:
Transaction ST02 → SAP Memory
If used > 90%:
- Increase em/initial_size_MB parameter
- Restart SAP system
3. Check database space:
SELECT * FROM M_DISK_USAGE;
If > 85% full:
- Add storage
- Archive old data
Issue 2: High Latency
Symptoms:
Real-time replication showing 5+ minute delay
Expected: < 1 minute
Diagnosis:
-- Measure current latency
SELECT
MT_ID,
TABLE_NAME,
SECONDS_BETWEEN(SOURCE_TIMESTAMP, TARGET_TIMESTAMP) as LATENCY_SEC
FROM /DMIS/REPLICATION_MONITOR
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1)
ORDER BY LATENCY_SEC DESC;
-- Result:
MT_ID TABLE LATENCY_SEC
001 VBAP 312 ← Problem!
001 VBAK 8
001 KNA1 5
Root Cause Analysis:
Step 1: Identify bottleneck
├── Source read time: 15s ✓
├── Network transfer: 30s ✓
├── Transformation: 5s ✓
├── Target write time: 262s ✗ Problem here!
└── Total: 312s
Step 2: Investigate target write
Check HANA:
- CPU: 45% ✓
- Memory: 62% ✓
- Disk I/O: 95% ✗ Bottleneck!
Root Cause: Disk I/O saturation
Solution:
-- 1. Check indexes on target table
SELECT INDEX_NAME, INDEX_TYPE
FROM INDEXES
WHERE SCHEMA_NAME = 'SLTREPL'
AND TABLE_NAME = 'VBAP';
-- Too many indexes slow down inserts
-- Drop unnecessary indexes:
DROP INDEX SLTREPL.VBAP_IDX_3;
DROP INDEX SLTREPL.VBAP_IDX_4;
-- 2. Disable auto-merge during heavy load
ALTER TABLE SLTREPL.VBAP AUTO MERGE OFF;
-- 3. Increase commit batch size
LTRC → Advanced → Commit Frequency: 5,000 → 20,000
-- 4. Add partitioning
ALTER TABLE SLTREPL.VBAP
PARTITION BY RANGE (ERDAT) (
PARTITION P2026 VALUES < '20270101'
);
-- Result:
Before: 312s latency
After: 8s latency (97% improvement!)
Issue 3: Data Mismatch
Symptoms:
Source count: 1,000,000 records
Target count: 999,453 records
Missing: 547 records
Diagnosis:
-- Identify missing records
SELECT SOURCE.VBELN
FROM SOURCE_SYSTEM.VBAP AS SOURCE
LEFT JOIN SLTREPL.VBAP AS TARGET ON SOURCE.VBELN = TARGET.VBELN
WHERE TARGET.VBELN IS NULL;
-- Check error queue
SELECT * FROM /DMIS/ERROR_QUEUE
WHERE TABLE_NAME = 'VBAP'
ORDER BY TIMESTAMP DESC;
Common Causes:
Cause 1: Foreign Key Violations
Error: "Foreign key constraint violated"
Missing parent record in KNA1
Solution:
1. Replicate parent tables first:
LTRC → Table Order → Move KNA1 before VBAP
2. Or disable FK checks temporarily:
ALTER TABLE SLTREPL.VBAP
DISABLE TRIGGER ALL;
[Replicate missing records]
ALTER TABLE SLTREPL.VBAP
ENABLE TRIGGER ALL;
Cause 2: Transformation Errors
Error: "Data type conversion failed"
Field NETWR: "12,345.67" → Decimal failed
Solution:
1. Review transformation:
LTRC → Transformation → VBAP → NETWR
2. Fix conversion:
ABAP Routine:
RESULT = CONDENSE(SOURCE).
REPLACE ALL OCCURRENCES OF ',' IN RESULT WITH ''.
RESULT = CONV DECFLOAT34( RESULT ).
Cause 3: Filtering Issues
Applied filter: WHERE VKORG = 'DE01'
But records with VKORG = NULL were excluded
Solution:
Update filter condition:
WHERE (VKORG = 'DE01' OR VKORG IS NULL)
Issue 4: Memory Overflow
Symptoms:
Error: "Memory allocation failed"
Short dump: TSV_TNEW_PAGE_ALLOC_FAILED
System crash during large table replication
Diagnosis:
Transaction: ST22 (ABAP Dump Analysis)
Dump info:
├── Program: /DMIS/CL_REPLICATION
├── Error: TSV_TNEW_PAGE_ALLOC_FAILED
├── Time: During initial load of BSEG (120M records)
└── Memory: 16 GB allocated, 18 GB requested
Solution:
1. Increase package size (split into smaller batches):
LTRC → Table Settings → BSEG
Package Size: 100,000 → 10,000
2. Increase memory parameters:
Transaction: RZ10
Parameters:
em/initial_size_MB = 8192
em/max_size_MB = 16384
ztta/roll_extension = 4000000000
Restart required: Yes
3. Schedule large tables during off-peak:
LTRC → Schedule → BSEG
Run time: 02:00 - 06:00 (night)
4. Consider parallel load:
Split BSEG by fiscal year:
- MT_ID_001: BSEG WHERE GJAHR = 2024
- MT_ID_002: BSEG WHERE GJAHR = 2025
- MT_ID_003: BSEG WHERE GJAHR = 2026
Issue 5: Logging Table Overflow
Symptoms:
Error: "Logging table /DMIS/DT_LOG full"
Warning: "Log table size: 500 GB"
Replication slowing down or stopping
Diagnosis:
-- Check logging table size
SELECT
TABLE_NAME,
DISK_SIZE / 1024 / 1024 / 1024 as SIZE_GB,
RECORD_COUNT
FROM M_TABLES
WHERE TABLE_NAME LIKE '/DMIS%LOG%'
ORDER BY DISK_SIZE DESC;
-- Result:
TABLE SIZE_GB RECORD_COUNT
/DMIS/DT_LOG 487 1,245,678,901
/DMIS/LOG_ERROR 12 234,567
Solution:
-- 1. Clean up old logs (older than 30 days)
DELETE FROM /DMIS/DT_LOG
WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);
-- Caution: Large delete, run in batches
DO BEGIN
DECLARE v_deleted INT = 0;
WHILE v_deleted < 10000000 DO
DELETE FROM /DMIS/DT_LOG
WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30)
LIMIT 100000;
v_deleted = v_deleted + ::ROWCOUNT;
COMMIT;
END WHILE;
END;
-- 2. Archive to history table
CREATE COLUMN TABLE /DMIS/DT_LOG_ARCHIVE
AS (SELECT * FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6));
DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6);
-- 3. Schedule automatic cleanup
CREATE PROCEDURE CLEANUP_SLT_LOGS()
AS
BEGIN
DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);
DELETE FROM /DMIS/LOG_ERROR WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -90);
END;
-- Schedule: Run daily at 02:00
3. Advanced Diagnostics
Performance Analysis
-- Comprehensive performance report
SELECT
MT_ID,
TABLE_NAME,
COUNT(*) as OPERATIONS,
AVG(DURATION_MS) as AVG_MS,
MAX(DURATION_MS) as MAX_MS,
SUM(CASE WHEN STATUS = 'ERROR' THEN 1 ELSE 0 END) as ERROR_COUNT,
SUM(RECORD_COUNT) as TOTAL_RECORDS
FROM /DMIS/PERFORMANCE_LOG
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY MT_ID, TABLE_NAME
HAVING AVG_MS > 1000 OR ERROR_COUNT > 0
ORDER BY AVG_MS DESC;
-- Bottleneck identification
SELECT
PHASE, -- READ, TRANSFORM, WRITE
AVG(DURATION_MS) as AVG_MS,
SUM(DURATION_MS) as TOTAL_MS
FROM /DMIS/PHASE_TIMING
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY PHASE
ORDER BY TOTAL_MS DESC;
Network Diagnostics
#!/bin/bash
# network_diagnostics.sh
echo "=== Network Diagnostics for SLT ==="
# 1. Ping test
echo "1. Testing connectivity..."
ping -c 5 source-system.company.com
ping -c 5 hana-target.company.com
# 2. Bandwidth test
echo "2. Testing bandwidth..."
iperf3 -c hana-target.company.com -t 30
# Expected: > 1 Gbps for good performance
# 3. Port connectivity
echo "3. Testing ports..."
nc -zv source-system.company.com 3300 # SAP Gateway
nc -zv hana-target.company.com 30015 # HANA SQL
# 4. DNS resolution
echo "4. Testing DNS..."
nslookup source-system.company.com
nslookup hana-target.company.com
# 5. Traceroute
echo "5. Network path..."
traceroute hana-target.company.com
# 6. Packet loss
echo "6. Packet loss check..."
ping -c 100 hana-target.company.com | tail -1
# Expected: 0% packet loss
System Resource Check
Transaction: ST06 (Operating System Monitor)
Check:
├── CPU: < 80% average
├── Memory: < 85% used
├── Disk I/O: < 80% utilization
├── Network: < 70% bandwidth
└── Swap: < 10% used
Critical Thresholds:
├── CPU > 90%: Add more cores or reduce parallel jobs
├── Memory > 90%: Increase RAM or optimize queries
├── Disk I/O > 90%: Add faster storage (SSD/NVMe)
├── Network > 85%: Upgrade bandwidth
└── Swap > 20%: Critical memory shortage
4. Log Analysis
Interpreting SLT Logs
# Main SLT log location
cd /usr/sap/SLT/DVEBMGS00/work
# Key log files:
ls -lh dev_*
dev_w0 # Work process 0 log
dev_disp # Dispatcher log
dev_rd # RFC dispatcher log
dev_icm # ICM (Internet Communication Manager)
# Search for errors
grep -i "error" dev_w* | tail -20
grep -i "exception" dev_w* | tail -20
grep -i "dump" dev_w* | tail -20
# Monitor real-time
tail -f dev_w0
Important Log Patterns
Pattern 1: RFC Timeout
"RFC_ERROR_SYSTEM_FAILURE: Timeout during RFC call"
→ Solution: Increase RFC timeout in SM59
Pattern 2: Database Lock
"DBE_EXECUTE_FAILED: ORA-00060: deadlock detected"
→ Solution: Review table locking, optimize queries
Pattern 3: Memory Issue
"TSV_TNEW_PAGE_ALLOC_FAILED: Memory allocation error"
→ Solution: Increase memory parameters
Pattern 4: Authorization Missing
"RFC_ERROR_SYSTEM_FAILURE: User has no authorization"
→ Solution: Check S_RFC authorization for RFC user
Pattern 5: Table Not Found
"TABLE_NOT_AVAILABLE: Table VBAP does not exist"
→ Solution: Verify table exists in source, check authorization
5. Debugging Tools
Transaction Codes for Troubleshooting
LTRC - Main SLT configuration and monitoring
SM21 - System log
SM50 - Work process overview
SM66 - Global work process overview
ST22 - ABAP dump analysis
ST02 - Tune: Buffers
ST03N - Workload analysis
ST04 - Database performance
ST05 - SQL trace
ST06 - Operating system monitor
SM59 - RFC destinations
DB02 - Database space
SM12 - Lock entries
SM13 - Update records
DBACOCKPIT - Database cockpit
SM37 - Background jobs
Enable SLT Trace
Transaction: LTRC → Advanced → Trace
Enable:
☑ Application Trace
☑ Database Trace
☑ RFC Trace
Trace Level: ● 3 (Detailed)
Start Time: Immediately
Duration: 30 minutes
Result: Detailed trace in /usr/sap/SLT/DVEBMGS00/work/trace/
SQL Trace
Transaction: ST05
1. Activate Trace:
- Trace: ☑ SQL Trace
- User: SLT_USER
- Duration: 5 minutes
2. Execute operation (e.g., start replication)
3. Deactivate Trace
4. Display Trace:
- Shows all SQL statements
- Execution times
- Number of records
5. Analyze:
- Find slow queries (> 1 second)
- Identify missing indexes
- Optimize WHERE clauses
6. Escalation to SAP Support
When to Escalate
Escalate when:
├── Issue cannot be resolved in 4 hours
├── Data corruption suspected
├── System-wide outage
├── SAP Note indicates known bug
└── Requires kernel patch or hot fix
Information to Collect
1. System Information:
- Transaction: SM51 → System → Status
- SAP Version, Kernel version
- Database version (HANA revision)
2. Error Details:
- Transaction: ST22 → Short dump
- Error message screenshot
- Timestamp of issue
3. Logs:
- dev_w* files (last 24 hours)
- /DMIS/LOG_ERROR table export
- System log (SM21)
4. Configuration:
- MT_ID configuration export
- Table list
- Transformation details
5. Performance Data:
- ST03N report (last 7 days)
- AWR report (HANA)
6. Steps to Reproduce:
- Detailed description
- Can issue be reproduced?
Create Support Ticket:
https://launchpad.support.sap.com
Component: DMC-SLT
Priority: Based on business impact
Attach all collected information
7. Preventive Measures
Health Check Checklist
- Daily: Check error queue (should be empty)
- Daily: Monitor replication lag (< 1 minute)
- Daily: Review system log (SM21) for warnings
- Weekly: Check disk space (> 20% free)
- Weekly: Review performance trends
- Weekly: Clean up old logs
- Monthly: Review and optimize slow tables
- Monthly: Check for SAP Notes
- Quarterly: Performance tuning session
- Quarterly: DR test
- Annually: Full system audit
Monitoring Alerts
-- Set up proactive alerts
CREATE PROCEDURE CHECK_SLT_HEALTH()
AS
BEGIN
DECLARE v_error_count INT;
DECLARE v_repl_lag INT;
DECLARE v_disk_pct INT;
-- Check error queue
SELECT COUNT(*) INTO v_error_count
FROM /DMIS/ERROR_QUEUE
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1);
IF v_error_count > 10 THEN
-- Send alert: High error rate
CALL SEND_ALERT('SLT Error Queue', 'WARNING', v_error_count || ' errors in last hour');
END IF;
-- Check replication lag
SELECT MAX(SECONDS_BETWEEN(SOURCE_TS, TARGET_TS)) INTO v_repl_lag
FROM /DMIS/REPLICATION_MONITOR;
IF v_repl_lag > 300 THEN
-- Send alert: High latency
CALL SEND_ALERT('SLT Latency', 'WARNING', 'Replication lag: ' || v_repl_lag || ' seconds');
END IF;
-- Check disk space
SELECT USED_SIZE * 100.0 / TOTAL_SIZE INTO v_disk_pct
FROM M_DISK_USAGE
WHERE USAGE_TYPE = 'DATA';
IF v_disk_pct > 85 THEN
-- Send alert: Low disk space
CALL SEND_ALERT('Disk Space', 'CRITICAL', 'Disk ' || v_disk_pct || '% full');
END IF;
END;
-- Schedule: Run every 15 minutes
8. Knowledge Base
Quick Reference
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
| Replication slow | Network latency | Check bandwidth, enable compression |
| High CPU | Too many parallel jobs | Reduce parallel jobs |
| High memory | Large package size | Reduce package size |
| Connection error | RFC timeout | Increase timeout, check firewall |
| Missing data | FK violations | Replicate parent tables first |
| Lock errors | Concurrent updates | Adjust commit frequency |
| Log table full | No cleanup job | Delete old logs, schedule cleanup |
9. Troubleshooting Flowchart
Issue Reported
↓
Can reproduce?
Yes → Collect logs and screenshots
No → Monitor for recurrence
↓
Check recent changes (last 7 days)
↓
Review error logs (/DMIS/LOG_ERROR)
↓
Check system resources (ST06)
↓
Found root cause?
Yes → Apply fix → Test → Document
No → Enable trace → Reproduce → Analyze
↓
Still not resolved?
Yes → Escalate to SAP Support
No → Close ticket, update KB
Summary
✅ Common issue diagnosis and resolution ✅ Advanced diagnostic techniques ✅ Log analysis and interpretation ✅ Debugging tools and transaction codes ✅ SAP support escalation procedures ✅ Preventive maintenance measures ✅ Proactive monitoring and alerts ✅ Quick reference guide ✅ Troubleshooting flowchart
Next: Module 19 - Best Practices and Design Patterns