Module 18: Troubleshooting Guide

Comprehensive troubleshooting techniques for resolving common and complex SLT issues.

1. Diagnostic Framework

graph TD
    A[Issue Reported] --> B{Symptom Category}
    B -->|Performance| C[Check System Resources]
    B -->|Error| D[Review Error Logs]
    B -->|Data Quality| E[Validate Data]
    B -->|Connectivity| F[Test Network]
    C --> G[Analyze and Fix]
    D --> G
    E --> G
    F --> G
    G --> H{Resolved?}
    H -->|No| I[Escalate to SAP]
    H -->|Yes| J[Document Solution]

2. Common Issues and Solutions

Issue 1: Replication Not Starting

Symptoms:

Transaction: LTRC
Status: ⚠ Replication not active
Error: MT_ID cannot be started

Diagnosis:

-- Check MT_ID status
SELECT * FROM /DMIS/MT_CONFIG
WHERE MT_ID = 'MT_ID_PROD_01';

-- Check for active errors
SELECT * FROM /DMIS/LOG_ERROR
WHERE MT_ID = 'MT_ID_PROD_01'
ORDER BY TIMESTAMP DESC
LIMIT 10;

Common Causes and Solutions:

Cause 1: RFC Connection Failed

Check: Transaction SM59
Error: "Connection timed out"

Solution:
1. Verify network connectivity:
   ping source-system.company.com
   
2. Test RFC connection:
   SM59 → Select RFC → Connection Test
   
3. Check firewall rules:
   - Port 3300-3399 (SAP Gateway)
   - Port 3200-3299 (Dispatcher)
   
4. Verify RFC user credentials:
   - User: RFC_USER
   - Password: Check not expired
   - Authorizations: S_RFC (all)

Cause 2: Target HANA Not Reachable

Check: Test HANA connection
Error: "Unable to connect to database"

Solution:
1. Verify HANA is running:
   HDB info
   
2. Test connection:
   hdbsql -n hana-server:30015 -u SLT_USER -p password
   
3. Check HANA instance:
   Transaction: DBACOCKPIT → Check System
   
4. Verify schema exists:
   SELECT SCHEMA_NAME FROM SCHEMAS WHERE SCHEMA_NAME = 'SLTREPL';

Cause 3: Insufficient Resources

Check: Transaction SM50 (Process Overview)
Error: "No work processes available"

Solution:
1. Increase parallel jobs:
   LTRC → Advanced Settings → Parallel Jobs: 8 → 16
   
2. Check memory:
   Transaction ST02 → SAP Memory
   If used > 90%:
   - Increase em/initial_size_MB parameter
   - Restart SAP system
   
3. Check database space:
   SELECT * FROM M_DISK_USAGE;
   If > 85% full:
   - Add storage
   - Archive old data

Issue 2: High Latency

Symptoms:

Real-time replication showing 5+ minute delay
Expected: < 1 minute

Diagnosis:

-- Measure current latency
SELECT 
  MT_ID,
  TABLE_NAME,
  SECONDS_BETWEEN(SOURCE_TIMESTAMP, TARGET_TIMESTAMP) as LATENCY_SEC
FROM /DMIS/REPLICATION_MONITOR
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1)
ORDER BY LATENCY_SEC DESC;

-- Result:
MT_ID TABLE LATENCY_SEC
001   VBAP  312         ← Problem!
001   VBAK  8
001   KNA1  5

Root Cause Analysis:

Step 1: Identify bottleneck
├── Source read time: 15s ✓
├── Network transfer: 30s ✓
├── Transformation: 5s ✓
├── Target write time: 262s ✗ Problem here!
└── Total: 312s

Step 2: Investigate target write
Check HANA:
- CPU: 45% ✓
- Memory: 62% ✓
- Disk I/O: 95% ✗ Bottleneck!

Root Cause: Disk I/O saturation

Solution:

-- 1. Check indexes on target table
SELECT INDEX_NAME, INDEX_TYPE 
FROM INDEXES 
WHERE SCHEMA_NAME = 'SLTREPL' 
  AND TABLE_NAME = 'VBAP';

-- Too many indexes slow down inserts
-- Drop unnecessary indexes:
DROP INDEX SLTREPL.VBAP_IDX_3;
DROP INDEX SLTREPL.VBAP_IDX_4;

-- 2. Disable auto-merge during heavy load
ALTER TABLE SLTREPL.VBAP AUTO MERGE OFF;

-- 3. Increase commit batch size
LTRC → Advanced → Commit Frequency: 5,000 → 20,000

-- 4. Add partitioning
ALTER TABLE SLTREPL.VBAP 
PARTITION BY RANGE (ERDAT) (
  PARTITION P2026 VALUES < '20270101'
);

-- Result:
Before: 312s latency
After: 8s latency (97% improvement!)

Issue 3: Data Mismatch

Symptoms:

Source count: 1,000,000 records
Target count: 999,453 records
Missing: 547 records

Diagnosis:

-- Identify missing records
SELECT SOURCE.VBELN
FROM SOURCE_SYSTEM.VBAP AS SOURCE
LEFT JOIN SLTREPL.VBAP AS TARGET ON SOURCE.VBELN = TARGET.VBELN
WHERE TARGET.VBELN IS NULL;

-- Check error queue
SELECT * FROM /DMIS/ERROR_QUEUE
WHERE TABLE_NAME = 'VBAP'
ORDER BY TIMESTAMP DESC;

Common Causes:

Cause 1: Foreign Key Violations

Error: "Foreign key constraint violated"
Missing parent record in KNA1

Solution:
1. Replicate parent tables first:
   LTRC → Table Order → Move KNA1 before VBAP
   
2. Or disable FK checks temporarily:
   ALTER TABLE SLTREPL.VBAP 
   DISABLE TRIGGER ALL;
   
   [Replicate missing records]
   
   ALTER TABLE SLTREPL.VBAP 
   ENABLE TRIGGER ALL;

Cause 2: Transformation Errors

Error: "Data type conversion failed"
Field NETWR: "12,345.67" → Decimal failed

Solution:
1. Review transformation:
   LTRC → Transformation → VBAP → NETWR
   
2. Fix conversion:
   ABAP Routine:
   RESULT = CONDENSE(SOURCE).
   REPLACE ALL OCCURRENCES OF ',' IN RESULT WITH ''.
   RESULT = CONV DECFLOAT34( RESULT ).

Cause 3: Filtering Issues

Applied filter: WHERE VKORG = 'DE01'
But records with VKORG = NULL were excluded

Solution:
Update filter condition:
WHERE (VKORG = 'DE01' OR VKORG IS NULL)

Issue 4: Memory Overflow

Symptoms:

Error: "Memory allocation failed"
Short dump: TSV_TNEW_PAGE_ALLOC_FAILED
System crash during large table replication

Diagnosis:

Transaction: ST22 (ABAP Dump Analysis)

Dump info:
├── Program: /DMIS/CL_REPLICATION
├── Error: TSV_TNEW_PAGE_ALLOC_FAILED
├── Time: During initial load of BSEG (120M records)
└── Memory: 16 GB allocated, 18 GB requested

Solution:

1. Increase package size (split into smaller batches):
   LTRC → Table Settings → BSEG
   Package Size: 100,000 → 10,000
   
2. Increase memory parameters:
   Transaction: RZ10
   
   Parameters:
   em/initial_size_MB = 8192
   em/max_size_MB = 16384
   ztta/roll_extension = 4000000000
   
   Restart required: Yes
   
3. Schedule large tables during off-peak:
   LTRC → Schedule → BSEG
   Run time: 02:00 - 06:00 (night)
   
4. Consider parallel load:
   Split BSEG by fiscal year:
   - MT_ID_001: BSEG WHERE GJAHR = 2024
   - MT_ID_002: BSEG WHERE GJAHR = 2025
   - MT_ID_003: BSEG WHERE GJAHR = 2026

Issue 5: Logging Table Overflow

Symptoms:

Error: "Logging table /DMIS/DT_LOG full"
Warning: "Log table size: 500 GB"
Replication slowing down or stopping

Diagnosis:

-- Check logging table size
SELECT 
  TABLE_NAME,
  DISK_SIZE / 1024 / 1024 / 1024 as SIZE_GB,
  RECORD_COUNT
FROM M_TABLES
WHERE TABLE_NAME LIKE '/DMIS%LOG%'
ORDER BY DISK_SIZE DESC;

-- Result:
TABLE           SIZE_GB  RECORD_COUNT
/DMIS/DT_LOG    487      1,245,678,901
/DMIS/LOG_ERROR 12       234,567

Solution:

-- 1. Clean up old logs (older than 30 days)
DELETE FROM /DMIS/DT_LOG
WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);

-- Caution: Large delete, run in batches
DO BEGIN
  DECLARE v_deleted INT = 0;
  WHILE v_deleted < 10000000 DO
    DELETE FROM /DMIS/DT_LOG
    WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30)
    LIMIT 100000;
    
    v_deleted = v_deleted + ::ROWCOUNT;
    COMMIT;
  END WHILE;
END;

-- 2. Archive to history table
CREATE COLUMN TABLE /DMIS/DT_LOG_ARCHIVE 
AS (SELECT * FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6));

DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6);

-- 3. Schedule automatic cleanup
CREATE PROCEDURE CLEANUP_SLT_LOGS()
AS
BEGIN
  DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);
  DELETE FROM /DMIS/LOG_ERROR WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -90);
END;

-- Schedule: Run daily at 02:00

3. Advanced Diagnostics

Performance Analysis

-- Comprehensive performance report
SELECT 
  MT_ID,
  TABLE_NAME,
  COUNT(*) as OPERATIONS,
  AVG(DURATION_MS) as AVG_MS,
  MAX(DURATION_MS) as MAX_MS,
  SUM(CASE WHEN STATUS = 'ERROR' THEN 1 ELSE 0 END) as ERROR_COUNT,
  SUM(RECORD_COUNT) as TOTAL_RECORDS
FROM /DMIS/PERFORMANCE_LOG
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY MT_ID, TABLE_NAME
HAVING AVG_MS > 1000 OR ERROR_COUNT > 0
ORDER BY AVG_MS DESC;

-- Bottleneck identification
SELECT 
  PHASE,  -- READ, TRANSFORM, WRITE
  AVG(DURATION_MS) as AVG_MS,
  SUM(DURATION_MS) as TOTAL_MS
FROM /DMIS/PHASE_TIMING
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY PHASE
ORDER BY TOTAL_MS DESC;

Network Diagnostics

#!/bin/bash
# network_diagnostics.sh

echo "=== Network Diagnostics for SLT ==="

# 1. Ping test
echo "1. Testing connectivity..."
ping -c 5 source-system.company.com
ping -c 5 hana-target.company.com

# 2. Bandwidth test
echo "2. Testing bandwidth..."
iperf3 -c hana-target.company.com -t 30
# Expected: > 1 Gbps for good performance

# 3. Port connectivity
echo "3. Testing ports..."
nc -zv source-system.company.com 3300  # SAP Gateway
nc -zv hana-target.company.com 30015   # HANA SQL

# 4. DNS resolution
echo "4. Testing DNS..."
nslookup source-system.company.com
nslookup hana-target.company.com

# 5. Traceroute
echo "5. Network path..."
traceroute hana-target.company.com

# 6. Packet loss
echo "6. Packet loss check..."
ping -c 100 hana-target.company.com | tail -1
# Expected: 0% packet loss

System Resource Check

Transaction: ST06 (Operating System Monitor)

Check:
├── CPU: < 80% average
├── Memory: < 85% used
├── Disk I/O: < 80% utilization
├── Network: < 70% bandwidth
└── Swap: < 10% used

Critical Thresholds:
├── CPU > 90%: Add more cores or reduce parallel jobs
├── Memory > 90%: Increase RAM or optimize queries
├── Disk I/O > 90%: Add faster storage (SSD/NVMe)
├── Network > 85%: Upgrade bandwidth
└── Swap > 20%: Critical memory shortage

4. Log Analysis

Interpreting SLT Logs

# Main SLT log location
cd /usr/sap/SLT/DVEBMGS00/work

# Key log files:
ls -lh dev_*

dev_w0    # Work process 0 log
dev_disp  # Dispatcher log
dev_rd    # RFC dispatcher log
dev_icm   # ICM (Internet Communication Manager)

# Search for errors
grep -i "error" dev_w* | tail -20
grep -i "exception" dev_w* | tail -20
grep -i "dump" dev_w* | tail -20

# Monitor real-time
tail -f dev_w0

Important Log Patterns

Pattern 1: RFC Timeout
"RFC_ERROR_SYSTEM_FAILURE: Timeout during RFC call"
→ Solution: Increase RFC timeout in SM59

Pattern 2: Database Lock
"DBE_EXECUTE_FAILED: ORA-00060: deadlock detected"
→ Solution: Review table locking, optimize queries

Pattern 3: Memory Issue
"TSV_TNEW_PAGE_ALLOC_FAILED: Memory allocation error"
→ Solution: Increase memory parameters

Pattern 4: Authorization Missing
"RFC_ERROR_SYSTEM_FAILURE: User has no authorization"
→ Solution: Check S_RFC authorization for RFC user

Pattern 5: Table Not Found
"TABLE_NOT_AVAILABLE: Table VBAP does not exist"
→ Solution: Verify table exists in source, check authorization

5. Debugging Tools

Transaction Codes for Troubleshooting

LTRC      - Main SLT configuration and monitoring
SM21      - System log
SM50      - Work process overview
SM66      - Global work process overview
ST22      - ABAP dump analysis
ST02      - Tune: Buffers
ST03N     - Workload analysis
ST04      - Database performance
ST05      - SQL trace
ST06      - Operating system monitor
SM59      - RFC destinations
DB02      - Database space
SM12      - Lock entries
SM13      - Update records
DBACOCKPIT - Database cockpit
SM37      - Background jobs

Enable SLT Trace

Transaction: LTRC → Advanced → Trace

Enable:
☑ Application Trace
☑ Database Trace
☑ RFC Trace

Trace Level: ● 3 (Detailed)

Start Time: Immediately
Duration: 30 minutes

Result: Detailed trace in /usr/sap/SLT/DVEBMGS00/work/trace/

SQL Trace

Transaction: ST05

1. Activate Trace:
   - Trace: ☑ SQL Trace
   - User: SLT_USER
   - Duration: 5 minutes

2. Execute operation (e.g., start replication)

3. Deactivate Trace

4. Display Trace:
   - Shows all SQL statements
   - Execution times
   - Number of records

5. Analyze:
   - Find slow queries (> 1 second)
   - Identify missing indexes
   - Optimize WHERE clauses

6. Escalation to SAP Support

When to Escalate

Escalate when:
├── Issue cannot be resolved in 4 hours
├── Data corruption suspected
├── System-wide outage
├── SAP Note indicates known bug
└── Requires kernel patch or hot fix

Information to Collect

1. System Information:
   - Transaction: SM51 → System → Status
   - SAP Version, Kernel version
   - Database version (HANA revision)

2. Error Details:
   - Transaction: ST22 → Short dump
   - Error message screenshot
   - Timestamp of issue

3. Logs:
   - dev_w* files (last 24 hours)
   - /DMIS/LOG_ERROR table export
   - System log (SM21)

4. Configuration:
   - MT_ID configuration export
   - Table list
   - Transformation details

5. Performance Data:
   - ST03N report (last 7 days)
   - AWR report (HANA)

6. Steps to Reproduce:
   - Detailed description
   - Can issue be reproduced?

Create Support Ticket:
https://launchpad.support.sap.com
Component: DMC-SLT
Priority: Based on business impact
Attach all collected information

7. Preventive Measures

Health Check Checklist

Monitoring Alerts

-- Set up proactive alerts
CREATE PROCEDURE CHECK_SLT_HEALTH()
AS
BEGIN
  DECLARE v_error_count INT;
  DECLARE v_repl_lag INT;
  DECLARE v_disk_pct INT;
  
  -- Check error queue
  SELECT COUNT(*) INTO v_error_count
  FROM /DMIS/ERROR_QUEUE
  WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1);
  
  IF v_error_count > 10 THEN
    -- Send alert: High error rate
    CALL SEND_ALERT('SLT Error Queue', 'WARNING', v_error_count || ' errors in last hour');
  END IF;
  
  -- Check replication lag
  SELECT MAX(SECONDS_BETWEEN(SOURCE_TS, TARGET_TS)) INTO v_repl_lag
  FROM /DMIS/REPLICATION_MONITOR;
  
  IF v_repl_lag > 300 THEN
    -- Send alert: High latency
    CALL SEND_ALERT('SLT Latency', 'WARNING', 'Replication lag: ' || v_repl_lag || ' seconds');
  END IF;
  
  -- Check disk space
  SELECT USED_SIZE * 100.0 / TOTAL_SIZE INTO v_disk_pct
  FROM M_DISK_USAGE
  WHERE USAGE_TYPE = 'DATA';
  
  IF v_disk_pct > 85 THEN
    -- Send alert: Low disk space
    CALL SEND_ALERT('Disk Space', 'CRITICAL', 'Disk ' || v_disk_pct || '% full');
  END IF;
END;

-- Schedule: Run every 15 minutes

8. Knowledge Base

Quick Reference

Symptom	Likely Cause	Quick Fix
Replication slow	Network latency	Check bandwidth, enable compression
High CPU	Too many parallel jobs	Reduce parallel jobs
High memory	Large package size	Reduce package size
Connection error	RFC timeout	Increase timeout, check firewall
Missing data	FK violations	Replicate parent tables first
Lock errors	Concurrent updates	Adjust commit frequency
Log table full	No cleanup job	Delete old logs, schedule cleanup

9. Troubleshooting Flowchart

Issue Reported
    ↓
Can reproduce?
    Yes → Collect logs and screenshots
    No → Monitor for recurrence
    ↓
Check recent changes (last 7 days)
    ↓
Review error logs (/DMIS/LOG_ERROR)
    ↓
Check system resources (ST06)
    ↓
Found root cause?
    Yes → Apply fix → Test → Document
    No → Enable trace → Reproduce → Analyze
    ↓
Still not resolved?
    Yes → Escalate to SAP Support
    No → Close ticket, update KB

Summary

✅ Common issue diagnosis and resolution ✅ Advanced diagnostic techniques ✅ Log analysis and interpretation ✅ Debugging tools and transaction codes ✅ SAP support escalation procedures ✅ Preventive maintenance measures ✅ Proactive monitoring and alerts ✅ Quick reference guide ✅ Troubleshooting flowchart

Next: Module 19 - Best Practices and Design Patterns

1. Diagnostic Framework​

2. Common Issues and Solutions​

Issue 1: Replication Not Starting​

Issue 2: High Latency​

Issue 3: Data Mismatch​

Issue 4: Memory Overflow​

Issue 5: Logging Table Overflow​

3. Advanced Diagnostics​

Performance Analysis​

Network Diagnostics​

System Resource Check​

4. Log Analysis​

Interpreting SLT Logs​

Important Log Patterns​

5. Debugging Tools​

Transaction Codes for Troubleshooting​

Enable SLT Trace​

SQL Trace​

6. Escalation to SAP Support​

When to Escalate​

Information to Collect​

7. Preventive Measures​

Health Check Checklist​

Monitoring Alerts​

8. Knowledge Base​

Quick Reference​

9. Troubleshooting Flowchart​

Summary​

1. Diagnostic Framework

2. Common Issues and Solutions

Issue 1: Replication Not Starting

Issue 2: High Latency

Issue 3: Data Mismatch

Issue 4: Memory Overflow

Issue 5: Logging Table Overflow

3. Advanced Diagnostics

Performance Analysis

Network Diagnostics

System Resource Check

4. Log Analysis

Interpreting SLT Logs

Important Log Patterns

5. Debugging Tools

Transaction Codes for Troubleshooting

Enable SLT Trace

SQL Trace

6. Escalation to SAP Support

When to Escalate

Information to Collect

7. Preventive Measures

Health Check Checklist

Monitoring Alerts

8. Knowledge Base

Quick Reference

9. Troubleshooting Flowchart

Summary