Skip to main content

External Data Integration in BW/4HANA: SDI and SDA Masterclass

Modern data warehouses must integrate data from diverse sources — legacy databases, cloud platforms, big data systems, and external APIs. SAP HANA provides two powerful approaches: Smart Data Integration (SDI) andSmart Data Access (SDA).

This guide will help you understand when to use each approach, how to implement them, and best practices for production-grade external data integration in BW/4HANA.

Understanding the Fundamentals

What is Smart Data Access (SDA)?

SDA enables virtual access to external data sources without physically replicating data into HANA. It's like querying remote databases as if they were local HANA tables.

Key Characteristics

  • Real-time data access (no latency)
  • No data duplication
  • Federated query execution
  • Supports 40+ data sources
  • Use cases: Real-time dashboards, data validation

What is Smart Data Integration (SDI)?

SDI is a comprehensive ETL framework for loading, transforming, and synchronizing data from external sources into HANA. It supports batch loads, real-time replication, and change data capture (CDC).

Key Characteristics

  • Data replication into HANA
  • Batch and real-time modes
  • Built-in transformations
  • Change data capture (CDC)
  • Use cases: Data warehousing, historical analysis

SDA vs SDI: Quick Comparison

AspectSDA (Smart Data Access)SDI (Smart Data Integration)
Data LocationStays in source systemCopied to HANA
LatencyReal-time (0 seconds)Batch/near real-time
PerformanceDepends on sourceHANA-optimized
StorageNo HANA storage usedUses HANA storage
TransformationsLimited (view-level)Full ETL capabilities
Historical DataNot persistedPersisted in HANA
CDC SupportNoYes
ComplexityLow (simple setup)High (full ETL design)
CostNo HANA storage costHANA storage cost

Smart Data Access (SDA) Implementation

Step 1: Create Remote Source

-- Create remote source connection to Oracle database
CREATE REMOTE SOURCE "ORACLE_ERP"
  ADAPTER "OracleAdapter"
  CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
    <ConnectionProperties name="ORACLE_ERP">
      <PropertyEntry name="host">oracle-prod.company.com</PropertyEntry>
      <PropertyEntry name="port">1521</PropertyEntry>
      <PropertyEntry name="service">ORCL</PropertyEntry>
      <PropertyEntry name="user">HANA_USER</PropertyEntry>
      <PropertyEntry name="password" encrypt="true">****</PropertyEntry>
    </ConnectionProperties>'
  WITH CREDENTIAL TYPE 'PASSWORD'
  USING 'user=HANA_USER;password=****';

Step 2: Create Virtual Table

-- Create virtual table pointing to Oracle table
CREATE VIRTUAL TABLE "VIRTUAL_SALES"
  AT "ORACLE_ERP"."<oracle_schema>"."SALES_DATA";

-- Grant access
GRANT SELECT ON "VIRTUAL_SALES" TO _SYS_REPO;

Step 3: Create Calculation View on Virtual Table

-- Calculation View: CV_SALES_ORACLE
Projection Node: VIRTUAL_SALES
  - SALES_ORDER_ID
  - CUSTOMER_ID
  - ORDER_DATE
  - AMOUNT

Aggregation Node:
  - GROUP BY: CUSTOMER_ID, ORDER_DATE
  - SUM(AMOUNT) as TOTAL_SALES

Calculated Columns:
  - TAX = AMOUNT * 0.18
  - NET_AMOUNT = AMOUNT + TAX

Step 4: Use in BW/4HANA (Open ODS View)

Open ODS View: OOV_SALES_ORACLE
  Type: Virtual
  Source: Calculation View CV_SALES_ORACLE
  
  Fields:
    - 0CUSTOMER (mapped to CUSTOMER_ID)
    - 0CALDAY (mapped to ORDER_DATE)
    - 0AMOUNT (mapped to TOTAL_SALES)
  
  Usage:
    - Can be used in Composite Providers
    - No data loaded into BW
    - Real-time queries to Oracle

Supported Data Sources (SDA)

CategorySources
Relational DatabasesOracle, MS SQL Server, DB2, MySQL, PostgreSQL, Sybase, Teradata
SAP SystemsSAP HANA, SAP IQ, MaxDB
Big DataHadoop (Hive), Spark SQL, Google BigQuery
Cloud DatabasesAmazon Redshift, Azure SQL, Snowflake
NoSQLMongoDB (limited)

Smart Data Integration (SDI) Implementation

Architecture Overview

SAP HANA System
  ├── SDI Data Provisioning Agent (DP Agent)
  │   ├── Installed on separate server
  │   ├── Connects to source systems
  │   └── Manages data flow
  │
  ├── Flowgraphs (Data Flows)
  │   ├── Source → Transformations → Target
  │   └── Scheduled or real-time
  │
  └── Target Tables in HANA
      └── Replicated/Transformed Data

Step 1: Install Data Provisioning Agent

# On Windows/Linux server
1. Download DP Agent from SAP Software Center
2. Extract and run installer
   ./hdbsetup --batch --configfile=dpagent_config.cfg

3. Register agent with HANA
   hdbinst --register_dpagent 
     --host=<hana_host> 
     --port=<hana_port>
     --agent_host=<agent_host>
     --agent_port=5050

4. Verify connection
   SELECT * FROM M_AGENTS;

Step 2: Create Remote Source (SDI)

-- Create remote source for MySQL database
CREATE REMOTE SOURCE "MYSQL_CRM"
  ADAPTER "MySQLAdapter"
  AT LOCATION AGENT "DPAgent_001"
  CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
    <ConnectionProperties name="MYSQL_CRM">
      <PropertyEntry name="host">mysql-prod.company.com</PropertyEntry>
      <PropertyEntry name="port">3306</PropertyEntry>
      <PropertyEntry name="database">crm_db</PropertyEntry>
      <PropertyEntry name="user">hana_etl</PropertyEntry>
      <PropertyEntry name="password" encrypt="true">****</PropertyEntry>
    </ConnectionProperties>'
  WITH CREDENTIAL TYPE 'PASSWORD'
  USING 'user=hana_etl;password=****';

Step 3: Create Replication Task (Batch Load)

-- Create replication task
CREATE REPLICATION TASK "RT_CUSTOMER_LOAD"
  SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
  TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REPLICA"
  WITH
    PARAMETERS (
      'execution_mode' = 'BATCH',
      'batch_size' = '10000',
      'truncate_target' = 'TRUE'
    )
  SCHEDULE 'CRON:0 2 * * *'; -- Daily at 2 AM

-- Start the task
ALTER REPLICATION TASK "RT_CUSTOMER_LOAD" START;

Step 4: Real-Time Replication with CDC

-- Enable CDC on source table (MySQL example)
-- On source database:
ALTER TABLE CUSTOMERS ADD COLUMN _cdc_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

-- In HANA:
CREATE REPLICATION TASK "RT_CUSTOMER_REALTIME"
  SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
  TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REALTIME"
  WITH
    PARAMETERS (
      'execution_mode' = 'REALTIME',
      'cdc_mode' = 'TIMESTAMP_BASED',
      'cdc_column' = '_cdc_timestamp',
      'poll_interval' = '10' -- seconds
    );

-- Monitor replication
SELECT 
  TASK_NAME,
  SOURCE_LOCATION,
  TARGET_TABLE_NAME,
  LAST_EXECUTION_TIME,
  ROWS_REPLICATED,
  STATUS
FROM M_REPLICATION_TASKS;

Step 5: Flowgraph for Complex ETL

-- Flowgraph: Transform and Load Customer Data
Flowgraph Name: FG_CUSTOMER_ETL

Node 1: Source (MySQL CUSTOMERS table)
  SELECT 
    customer_id,
    first_name,
    last_name,
    email,
    created_date
  FROM CUSTOMERS
  WHERE status = 'ACTIVE'

Node 2: Transformation
  - Calculated Column: full_name = first_name || ' ' || last_name
  - Filter: created_date >= ADD_YEARS(CURRENT_DATE, -5)
  - Lookup: Map region based on email domain

Node 3: Target (HANA table)
  UPSERT INTO HANA_SCHEMA.DIM_CUSTOMER
  KEY (customer_id)

Scheduling: Daily at 3 AM
Error Handling: Log to error table, continue on row errors

Real-World Use Cases

Use Case 1: Real-Time Sales Dashboard (SDA)

Scenario: Retail chain needs live sales from 200 stores (SQL Server databases)

Solution Architecture:
  
1. Create 200 remote sources (one per store)
   
2. Create virtual tables for each store's SALES table
   
3. Create union Calculation View:
   CV_ALL_STORES_SALES
     UNION ALL
       Store_001.SALES
       Store_002.SALES
       ...
       Store_200.SALES
   
4. Create Open ODS View in BW/4HANA
   OOV_REALTIME_SALES → CV_ALL_STORES_SALES
   
5. Build Composite Provider for reporting
   CP_SALES_REALTIME
     - Combines OOV_REALTIME_SALES with master data ADSOs
   
Result:
  ✅ Real-time dashboard (< 5 second latency)
  ✅ No data replication overhead
  ✅ Minimal HANA storage usage
  ⚠️ Query performance depends on store network

Use Case 2: Historical Data Warehouse (SDI)

Scenario: Manufacturing company consolidating 10 years of production data from Oracle, SAP ECC, and MongoDB

Solution Architecture:
  
1. Install DP Agent on dedicated ETL server
   
2. Create remote sources:
   - ORACLE_PRODUCTION
   - SAP_ECC
   - MONGODB_IOT
   
3. Create replication tasks:
   a) Oracle: Batch load (daily full refresh)
      RT_PRODUCTION_DATA → HANA_SCHEMA.PROD_DATA
      
   b) SAP ECC: Real-time CDC (log-based)
      RT_ECC_ORDERS → HANA_SCHEMA.ORDERS
      
   c) MongoDB: Flowgraph with transformation
      FG_IOT_SENSORS → HANA_SCHEMA.SENSOR_DATA
      (Flatten JSON, calculate aggregates)
   
4. Load into BW/4HANA ADSOs:
   - ADSO_PRODUCTION
   - ADSO_ORDERS
   - ADSO_IOT
   
5. Create Composite Provider for unified view
   
Result:
  ✅ Consolidated 10-year history in HANA
  ✅ Fast queries (HANA-optimized)
  ✅ Change tracking for audit trails
  ✅ Complex transformations applied

Use Case 3: Hybrid Approach (SDA + SDI)

Scenario: Financial services company needs both real-time market data and historical performance

Architecture:
  
Real-Time Layer (SDA):
  - Virtual access to live market data feed (PostgreSQL)
  - Open ODS View: OOV_MARKET_LIVE
  - Updated every 5 seconds
  
Historical Layer (SDI):
  - Daily batch load from same PostgreSQL
  - ADSO_MARKET_HISTORY (10 years retained)
  - Loaded at market close (6 PM daily)
  
Unified View (Composite Provider):
  CP_MARKET_ANALYSIS
    UNION
      OOV_MARKET_LIVE (today's data)
      ADSO_MARKET_HISTORY (historical data)
  
Query Logic:
  - If date = CURRENT_DATE → Query OOV (real-time)
  - If date < CURRENT_DATE → Query ADSO (historical)
  
Result:
  ✅ Real-time for intraday analysis
  ✅ Historical for trend analysis
  ✅ Optimized storage (no duplication)
  ✅ Fast queries (HANA for historical)

Performance Optimization

SDA Performance Best Practices

PracticeDescriptionImpact
Push-down filtersEnsure WHERE clauses execute on sourceHigh
Limit columnsSelect only needed fieldsMedium
Use indexesCreate indexes on source tablesHigh
Network optimizationUse fast, dedicated networkCritical
CachingEnable result cache in HANAMedium
Avoid joinsDon't join virtual + HANA tablesHigh

SDI Performance Best Practices

  • Batch size tuning: Optimize based on network and target system load (5000-50000 rows)
  • Parallel loads: Use multiple DP agents for large volumes
  • CDC over full refresh: Reduces data transfer and load time
  • Compression: Enable network compression for remote agents
  • Partitioning: Partition target tables by date/key for faster loads
  • Monitoring: Use M_REPLICATION_STATISTICS for bottleneck analysis

Security Considerations

SDA Security

1. Credential Management
   - Use encrypted passwords in remote source definitions
   - Store credentials in secure credential store
   - Rotate passwords regularly

2. Network Security
   - Use VPN/private network for remote connections
   - Encrypt connections (SSL/TLS)
   - Whitelist HANA IPs on source firewalls

3. Access Control
   - Grant minimal privileges on source databases
   - Use dedicated service accounts (not personal accounts)
   - Audit virtual table access in HANA

4. Data Masking
   - Mask sensitive fields in Calculation Views
   - Use HANA Dynamic Data Masking for PII

SDI Security

1. Agent Security
   - Install DP Agent in DMZ or secure network zone
   - Use firewall rules to restrict agent access
   - Secure agent-to-HANA communication (SSL)

2. Data in Transit
   - Enable encryption for all replication tasks
   - Use secure protocols (HTTPS, SFTP, etc.)
   - Compress and encrypt flowgraph data

3. Audit Trails
   - Log all replication activities
   - Monitor M_REPLICATION_AUDIT_LOG
   - Alert on failed replications

Troubleshooting Common Issues

SDA Issues

IssueCauseSolution
Slow queriesNo push-down to sourceCheck explain plan, add indexes
Connection timeoutNetwork/firewallVerify connectivity, adjust timeout
Wrong data typesType mapping issuesExplicit casting in Calculation View
Authentication failedCredential expiryUpdate remote source credentials

SDI Issues

IssueCauseSolution
Replication lagHigh source loadIncrease batch size, add agents
Agent disconnectedNetwork/resource issuesRestart agent, check logs
Data inconsistencyCDC configurationVerify CDC column, reset baseline
Out of memoryLarge batch sizeReduce batch size, increase agent RAM

Decision Framework

When to Use SDA

  • ✅ Need real-time data (< 1 minute latency)
  • ✅ Low data volume queries
  • ✅ Want to avoid data duplication
  • ✅ Stable, fast network to source
  • ✅ Limited HANA storage available
  • ❌ Avoid for: Complex transformations, high-volume queries, slow sources

When to Use SDI

  • ✅ Need historical data retention
  • ✅ Complex ETL transformations required
  • ✅ High query volumes expected
  • ✅ Source performance concerns
  • ✅ Change data capture needed
  • ❌ Avoid for: Simple real-time lookups, limited HANA storage

Monitoring and Maintenance

SDA Monitoring

-- Check virtual table statistics
SELECT 
  TABLE_NAME,
  REMOTE_SOURCE_NAME,
  RECORD_COUNT,
  LAST_ACCESS_TIME
FROM M_TABLES
WHERE IS_VIRTUAL_TABLE = 'TRUE';

-- Monitor query performance
SELECT 
  STATEMENT_STRING,
  EXECUTION_TIME,
  REMOTE_SOURCE_NAME
FROM M_SQL_PLAN_CACHE
WHERE STATEMENT_STRING LIKE '%VIRTUAL_%'
ORDER BY EXECUTION_TIME DESC;

SDI Monitoring

-- Check replication status
SELECT 
  TASK_NAME,
  STATUS,
  LAST_EXECUTION_TIME,
  ROWS_REPLICATED,
  ERROR_MESSAGE
FROM M_REPLICATION_TASKS;

-- Monitor agent health
SELECT 
  AGENT_NAME,
  HOST,
  STATUS,
  LAST_HEARTBEAT_TIME
FROM M_AGENTS;

-- Replication performance
SELECT 
  TASK_NAME,
  AVG(DURATION_MS) as AVG_DURATION,
  SUM(ROWS_AFFECTED) as TOTAL_ROWS
FROM M_REPLICATION_STATISTICS
GROUP BY TASK_NAME;

Conclusion

SDA and SDI are complementary technologies that, when used correctly, enable comprehensive external data integration in BW/4HANA. Choose based on your specific requirements:

  • SDA for real-time, on-demand access without duplication
  • SDI for robust, performant data warehousing with transformations
  • Hybrid for the best of both worlds

The right integration strategy isn't about choosing one over the other — it's about using each where it excels.

About the Author: Yogesh Pandey is a passionate developer and consultant specializing in SAP technologies and full-stack development.