Skip to main content

Module 18: Troubleshooting Guide

Comprehensive troubleshooting techniques for resolving common and complex SLT issues.

1. Diagnostic Framework

graph TD
A[Issue Reported] --> B{Symptom Category}
B -->|Performance| C[Check System Resources]
B -->|Error| D[Review Error Logs]
B -->|Data Quality| E[Validate Data]
B -->|Connectivity| F[Test Network]
C --> G[Analyze and Fix]
D --> G
E --> G
F --> G
G --> H{Resolved?}
H -->|No| I[Escalate to SAP]
H -->|Yes| J[Document Solution]

2. Common Issues and Solutions

Issue 1: Replication Not Starting

Symptoms:

Transaction: LTRC
Status: ⚠ Replication not active
Error: MT_ID cannot be started

Diagnosis:

-- Check MT_ID status
SELECT * FROM /DMIS/MT_CONFIG
WHERE MT_ID = 'MT_ID_PROD_01';

-- Check for active errors
SELECT * FROM /DMIS/LOG_ERROR
WHERE MT_ID = 'MT_ID_PROD_01'
ORDER BY TIMESTAMP DESC
LIMIT 10;

Common Causes and Solutions:

Cause 1: RFC Connection Failed

Check: Transaction SM59
Error: "Connection timed out"

Solution:
1. Verify network connectivity:
ping source-system.company.com

2. Test RFC connection:
SM59 → Select RFC → Connection Test

3. Check firewall rules:
- Port 3300-3399 (SAP Gateway)
- Port 3200-3299 (Dispatcher)

4. Verify RFC user credentials:
- User: RFC_USER
- Password: Check not expired
- Authorizations: S_RFC (all)

Cause 2: Target HANA Not Reachable

Check: Test HANA connection
Error: "Unable to connect to database"

Solution:
1. Verify HANA is running:
HDB info

2. Test connection:
hdbsql -n hana-server:30015 -u SLT_USER -p password

3. Check HANA instance:
Transaction: DBACOCKPIT → Check System

4. Verify schema exists:
SELECT SCHEMA_NAME FROM SCHEMAS WHERE SCHEMA_NAME = 'SLTREPL';

Cause 3: Insufficient Resources

Check: Transaction SM50 (Process Overview)
Error: "No work processes available"

Solution:
1. Increase parallel jobs:
LTRC → Advanced Settings → Parallel Jobs: 8 → 16

2. Check memory:
Transaction ST02 → SAP Memory
If used > 90%:
- Increase em/initial_size_MB parameter
- Restart SAP system

3. Check database space:
SELECT * FROM M_DISK_USAGE;
If > 85% full:
- Add storage
- Archive old data

Issue 2: High Latency

Symptoms:

Real-time replication showing 5+ minute delay
Expected: < 1 minute

Diagnosis:

-- Measure current latency
SELECT
MT_ID,
TABLE_NAME,
SECONDS_BETWEEN(SOURCE_TIMESTAMP, TARGET_TIMESTAMP) as LATENCY_SEC
FROM /DMIS/REPLICATION_MONITOR
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1)
ORDER BY LATENCY_SEC DESC;

-- Result:
MT_ID TABLE LATENCY_SEC
001 VBAP 312 ← Problem!
001 VBAK 8
001 KNA1 5

Root Cause Analysis:

Step 1: Identify bottleneck
├── Source read time: 15s ✓
├── Network transfer: 30s ✓
├── Transformation: 5s ✓
├── Target write time: 262s ✗ Problem here!
└── Total: 312s

Step 2: Investigate target write
Check HANA:
- CPU: 45% ✓
- Memory: 62% ✓
- Disk I/O: 95% ✗ Bottleneck!

Root Cause: Disk I/O saturation

Solution:

-- 1. Check indexes on target table
SELECT INDEX_NAME, INDEX_TYPE
FROM INDEXES
WHERE SCHEMA_NAME = 'SLTREPL'
AND TABLE_NAME = 'VBAP';

-- Too many indexes slow down inserts
-- Drop unnecessary indexes:
DROP INDEX SLTREPL.VBAP_IDX_3;
DROP INDEX SLTREPL.VBAP_IDX_4;

-- 2. Disable auto-merge during heavy load
ALTER TABLE SLTREPL.VBAP AUTO MERGE OFF;

-- 3. Increase commit batch size
LTRC → Advanced → Commit Frequency: 5,00020,000

-- 4. Add partitioning
ALTER TABLE SLTREPL.VBAP
PARTITION BY RANGE (ERDAT) (
PARTITION P2026 VALUES < '20270101'
);

-- Result:
Before: 312s latency
After: 8s latency (97% improvement!)

Issue 3: Data Mismatch

Symptoms:

Source count: 1,000,000 records
Target count: 999,453 records
Missing: 547 records

Diagnosis:

-- Identify missing records
SELECT SOURCE.VBELN
FROM SOURCE_SYSTEM.VBAP AS SOURCE
LEFT JOIN SLTREPL.VBAP AS TARGET ON SOURCE.VBELN = TARGET.VBELN
WHERE TARGET.VBELN IS NULL;

-- Check error queue
SELECT * FROM /DMIS/ERROR_QUEUE
WHERE TABLE_NAME = 'VBAP'
ORDER BY TIMESTAMP DESC;

Common Causes:

Cause 1: Foreign Key Violations

Error: "Foreign key constraint violated"
Missing parent record in KNA1

Solution:
1. Replicate parent tables first:
LTRC → Table Order → Move KNA1 before VBAP

2. Or disable FK checks temporarily:
ALTER TABLE SLTREPL.VBAP
DISABLE TRIGGER ALL;

[Replicate missing records]

ALTER TABLE SLTREPL.VBAP
ENABLE TRIGGER ALL;

Cause 2: Transformation Errors

Error: "Data type conversion failed"
Field NETWR: "12,345.67" → Decimal failed

Solution:
1. Review transformation:
LTRC → Transformation → VBAP → NETWR

2. Fix conversion:
ABAP Routine:
RESULT = CONDENSE(SOURCE).
REPLACE ALL OCCURRENCES OF ',' IN RESULT WITH ''.
RESULT = CONV DECFLOAT34( RESULT ).

Cause 3: Filtering Issues

Applied filter: WHERE VKORG = 'DE01'
But records with VKORG = NULL were excluded

Solution:
Update filter condition:
WHERE (VKORG = 'DE01' OR VKORG IS NULL)

Issue 4: Memory Overflow

Symptoms:

Error: "Memory allocation failed"
Short dump: TSV_TNEW_PAGE_ALLOC_FAILED
System crash during large table replication

Diagnosis:

Transaction: ST22 (ABAP Dump Analysis)

Dump info:
├── Program: /DMIS/CL_REPLICATION
├── Error: TSV_TNEW_PAGE_ALLOC_FAILED
├── Time: During initial load of BSEG (120M records)
└── Memory: 16 GB allocated, 18 GB requested

Solution:

1. Increase package size (split into smaller batches):
LTRC → Table Settings → BSEG
Package Size: 100,000 → 10,000

2. Increase memory parameters:
Transaction: RZ10

Parameters:
em/initial_size_MB = 8192
em/max_size_MB = 16384
ztta/roll_extension = 4000000000

Restart required: Yes

3. Schedule large tables during off-peak:
LTRC → Schedule → BSEG
Run time: 02:00 - 06:00 (night)

4. Consider parallel load:
Split BSEG by fiscal year:
- MT_ID_001: BSEG WHERE GJAHR = 2024
- MT_ID_002: BSEG WHERE GJAHR = 2025
- MT_ID_003: BSEG WHERE GJAHR = 2026

Issue 5: Logging Table Overflow

Symptoms:

Error: "Logging table /DMIS/DT_LOG full"
Warning: "Log table size: 500 GB"
Replication slowing down or stopping

Diagnosis:

-- Check logging table size
SELECT
TABLE_NAME,
DISK_SIZE / 1024 / 1024 / 1024 as SIZE_GB,
RECORD_COUNT
FROM M_TABLES
WHERE TABLE_NAME LIKE '/DMIS%LOG%'
ORDER BY DISK_SIZE DESC;

-- Result:
TABLE SIZE_GB RECORD_COUNT
/DMIS/DT_LOG 487 1,245,678,901
/DMIS/LOG_ERROR 12 234,567

Solution:

-- 1. Clean up old logs (older than 30 days)
DELETE FROM /DMIS/DT_LOG
WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);

-- Caution: Large delete, run in batches
DO BEGIN
DECLARE v_deleted INT = 0;
WHILE v_deleted < 10000000 DO
DELETE FROM /DMIS/DT_LOG
WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30)
LIMIT 100000;

v_deleted = v_deleted + ::ROWCOUNT;
COMMIT;
END WHILE;
END;

-- 2. Archive to history table
CREATE COLUMN TABLE /DMIS/DT_LOG_ARCHIVE
AS (SELECT * FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6));

DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_MONTHS(CURRENT_DATE, -6);

-- 3. Schedule automatic cleanup
CREATE PROCEDURE CLEANUP_SLT_LOGS()
AS
BEGIN
DELETE FROM /DMIS/DT_LOG WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -30);
DELETE FROM /DMIS/LOG_ERROR WHERE TIMESTAMP < ADD_DAYS(CURRENT_DATE, -90);
END;

-- Schedule: Run daily at 02:00

3. Advanced Diagnostics

Performance Analysis

-- Comprehensive performance report
SELECT
MT_ID,
TABLE_NAME,
COUNT(*) as OPERATIONS,
AVG(DURATION_MS) as AVG_MS,
MAX(DURATION_MS) as MAX_MS,
SUM(CASE WHEN STATUS = 'ERROR' THEN 1 ELSE 0 END) as ERROR_COUNT,
SUM(RECORD_COUNT) as TOTAL_RECORDS
FROM /DMIS/PERFORMANCE_LOG
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY MT_ID, TABLE_NAME
HAVING AVG_MS > 1000 OR ERROR_COUNT > 0
ORDER BY AVG_MS DESC;

-- Bottleneck identification
SELECT
PHASE, -- READ, TRANSFORM, WRITE
AVG(DURATION_MS) as AVG_MS,
SUM(DURATION_MS) as TOTAL_MS
FROM /DMIS/PHASE_TIMING
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -24)
GROUP BY PHASE
ORDER BY TOTAL_MS DESC;

Network Diagnostics

#!/bin/bash
# network_diagnostics.sh

echo "=== Network Diagnostics for SLT ==="

# 1. Ping test
echo "1. Testing connectivity..."
ping -c 5 source-system.company.com
ping -c 5 hana-target.company.com

# 2. Bandwidth test
echo "2. Testing bandwidth..."
iperf3 -c hana-target.company.com -t 30
# Expected: > 1 Gbps for good performance

# 3. Port connectivity
echo "3. Testing ports..."
nc -zv source-system.company.com 3300 # SAP Gateway
nc -zv hana-target.company.com 30015 # HANA SQL

# 4. DNS resolution
echo "4. Testing DNS..."
nslookup source-system.company.com
nslookup hana-target.company.com

# 5. Traceroute
echo "5. Network path..."
traceroute hana-target.company.com

# 6. Packet loss
echo "6. Packet loss check..."
ping -c 100 hana-target.company.com | tail -1
# Expected: 0% packet loss

System Resource Check

Transaction: ST06 (Operating System Monitor)

Check:
├── CPU: < 80% average
├── Memory: < 85% used
├── Disk I/O: < 80% utilization
├── Network: < 70% bandwidth
└── Swap: < 10% used

Critical Thresholds:
├── CPU > 90%: Add more cores or reduce parallel jobs
├── Memory > 90%: Increase RAM or optimize queries
├── Disk I/O > 90%: Add faster storage (SSD/NVMe)
├── Network > 85%: Upgrade bandwidth
└── Swap > 20%: Critical memory shortage

4. Log Analysis

Interpreting SLT Logs

# Main SLT log location
cd /usr/sap/SLT/DVEBMGS00/work

# Key log files:
ls -lh dev_*

dev_w0 # Work process 0 log
dev_disp # Dispatcher log
dev_rd # RFC dispatcher log
dev_icm # ICM (Internet Communication Manager)

# Search for errors
grep -i "error" dev_w* | tail -20
grep -i "exception" dev_w* | tail -20
grep -i "dump" dev_w* | tail -20

# Monitor real-time
tail -f dev_w0

Important Log Patterns

Pattern 1: RFC Timeout
"RFC_ERROR_SYSTEM_FAILURE: Timeout during RFC call"
→ Solution: Increase RFC timeout in SM59

Pattern 2: Database Lock
"DBE_EXECUTE_FAILED: ORA-00060: deadlock detected"
→ Solution: Review table locking, optimize queries

Pattern 3: Memory Issue
"TSV_TNEW_PAGE_ALLOC_FAILED: Memory allocation error"
→ Solution: Increase memory parameters

Pattern 4: Authorization Missing
"RFC_ERROR_SYSTEM_FAILURE: User has no authorization"
→ Solution: Check S_RFC authorization for RFC user

Pattern 5: Table Not Found
"TABLE_NOT_AVAILABLE: Table VBAP does not exist"
→ Solution: Verify table exists in source, check authorization

5. Debugging Tools

Transaction Codes for Troubleshooting

LTRC      - Main SLT configuration and monitoring
SM21 - System log
SM50 - Work process overview
SM66 - Global work process overview
ST22 - ABAP dump analysis
ST02 - Tune: Buffers
ST03N - Workload analysis
ST04 - Database performance
ST05 - SQL trace
ST06 - Operating system monitor
SM59 - RFC destinations
DB02 - Database space
SM12 - Lock entries
SM13 - Update records
DBACOCKPIT - Database cockpit
SM37 - Background jobs

Enable SLT Trace

Transaction: LTRC → Advanced → Trace

Enable:
☑ Application Trace
☑ Database Trace
☑ RFC Trace

Trace Level: ● 3 (Detailed)

Start Time: Immediately
Duration: 30 minutes

Result: Detailed trace in /usr/sap/SLT/DVEBMGS00/work/trace/

SQL Trace

Transaction: ST05

1. Activate Trace:
- Trace: ☑ SQL Trace
- User: SLT_USER
- Duration: 5 minutes

2. Execute operation (e.g., start replication)

3. Deactivate Trace

4. Display Trace:
- Shows all SQL statements
- Execution times
- Number of records

5. Analyze:
- Find slow queries (> 1 second)
- Identify missing indexes
- Optimize WHERE clauses

6. Escalation to SAP Support

When to Escalate

Escalate when:
├── Issue cannot be resolved in 4 hours
├── Data corruption suspected
├── System-wide outage
├── SAP Note indicates known bug
└── Requires kernel patch or hot fix

Information to Collect

1. System Information:
- Transaction: SM51 → System → Status
- SAP Version, Kernel version
- Database version (HANA revision)

2. Error Details:
- Transaction: ST22 → Short dump
- Error message screenshot
- Timestamp of issue

3. Logs:
- dev_w* files (last 24 hours)
- /DMIS/LOG_ERROR table export
- System log (SM21)

4. Configuration:
- MT_ID configuration export
- Table list
- Transformation details

5. Performance Data:
- ST03N report (last 7 days)
- AWR report (HANA)

6. Steps to Reproduce:
- Detailed description
- Can issue be reproduced?

Create Support Ticket:
https://launchpad.support.sap.com
Component: DMC-SLT
Priority: Based on business impact
Attach all collected information

7. Preventive Measures

Health Check Checklist

  • Daily: Check error queue (should be empty)
  • Daily: Monitor replication lag (< 1 minute)
  • Daily: Review system log (SM21) for warnings
  • Weekly: Check disk space (> 20% free)
  • Weekly: Review performance trends
  • Weekly: Clean up old logs
  • Monthly: Review and optimize slow tables
  • Monthly: Check for SAP Notes
  • Quarterly: Performance tuning session
  • Quarterly: DR test
  • Annually: Full system audit

Monitoring Alerts

-- Set up proactive alerts
CREATE PROCEDURE CHECK_SLT_HEALTH()
AS
BEGIN
DECLARE v_error_count INT;
DECLARE v_repl_lag INT;
DECLARE v_disk_pct INT;

-- Check error queue
SELECT COUNT(*) INTO v_error_count
FROM /DMIS/ERROR_QUEUE
WHERE TIMESTAMP >= ADD_HOURS(CURRENT_TIMESTAMP, -1);

IF v_error_count > 10 THEN
-- Send alert: High error rate
CALL SEND_ALERT('SLT Error Queue', 'WARNING', v_error_count || ' errors in last hour');
END IF;

-- Check replication lag
SELECT MAX(SECONDS_BETWEEN(SOURCE_TS, TARGET_TS)) INTO v_repl_lag
FROM /DMIS/REPLICATION_MONITOR;

IF v_repl_lag > 300 THEN
-- Send alert: High latency
CALL SEND_ALERT('SLT Latency', 'WARNING', 'Replication lag: ' || v_repl_lag || ' seconds');
END IF;

-- Check disk space
SELECT USED_SIZE * 100.0 / TOTAL_SIZE INTO v_disk_pct
FROM M_DISK_USAGE
WHERE USAGE_TYPE = 'DATA';

IF v_disk_pct > 85 THEN
-- Send alert: Low disk space
CALL SEND_ALERT('Disk Space', 'CRITICAL', 'Disk ' || v_disk_pct || '% full');
END IF;
END;

-- Schedule: Run every 15 minutes

8. Knowledge Base

Quick Reference

SymptomLikely CauseQuick Fix
Replication slowNetwork latencyCheck bandwidth, enable compression
High CPUToo many parallel jobsReduce parallel jobs
High memoryLarge package sizeReduce package size
Connection errorRFC timeoutIncrease timeout, check firewall
Missing dataFK violationsReplicate parent tables first
Lock errorsConcurrent updatesAdjust commit frequency
Log table fullNo cleanup jobDelete old logs, schedule cleanup

9. Troubleshooting Flowchart

Issue Reported

Can reproduce?
Yes → Collect logs and screenshots
No → Monitor for recurrence

Check recent changes (last 7 days)

Review error logs (/DMIS/LOG_ERROR)

Check system resources (ST06)

Found root cause?
Yes → Apply fix → Test → Document
No → Enable trace → Reproduce → Analyze

Still not resolved?
Yes → Escalate to SAP Support
No → Close ticket, update KB

Summary

✅ Common issue diagnosis and resolution ✅ Advanced diagnostic techniques ✅ Log analysis and interpretation ✅ Debugging tools and transaction codes ✅ SAP support escalation procedures ✅ Preventive maintenance measures ✅ Proactive monitoring and alerts ✅ Quick reference guide ✅ Troubleshooting flowchart

Next: Module 19 - Best Practices and Design Patterns