External Data Integration in BW/4HANA: SDI and SDA Masterclass

Yogesh Pandey

Jan 19, 2026

BW/4HANAHANASDISDAData Integration

Modern data warehouses must integrate data from diverse sources — legacy databases, cloud platforms, big data systems, and external APIs. SAP HANA provides two powerful approaches: Smart Data Integration (SDI) andSmart Data Access (SDA).

This guide will help you understand when to use each approach, how to implement them, and best practices for production-grade external data integration in BW/4HANA.

Understanding the Fundamentals

What is Smart Data Access (SDA)?

SDA enables virtual access to external data sources without physically replicating data into HANA. It's like querying remote databases as if they were local HANA tables.

Key Characteristics

Real-time data access (no latency)
No data duplication
Federated query execution
Supports 40+ data sources
Use cases: Real-time dashboards, data validation

What is Smart Data Integration (SDI)?

SDI is a comprehensive ETL framework for loading, transforming, and synchronizing data from external sources into HANA. It supports batch loads, real-time replication, and change data capture (CDC).

Key Characteristics

Data replication into HANA
Batch and real-time modes
Built-in transformations
Change data capture (CDC)
Use cases: Data warehousing, historical analysis

SDA vs SDI: Quick Comparison

Aspect	SDA (Smart Data Access)	SDI (Smart Data Integration)
Data Location	Stays in source system	Copied to HANA
Latency	Real-time (0 seconds)	Batch/near real-time
Performance	Depends on source	HANA-optimized
Storage	No HANA storage used	Uses HANA storage
Transformations	Limited (view-level)	Full ETL capabilities
Historical Data	Not persisted	Persisted in HANA
CDC Support	No	Yes
Complexity	Low (simple setup)	High (full ETL design)
Cost	No HANA storage cost	HANA storage cost

Smart Data Access (SDA) Implementation

Step 1: Create Remote Source

-- Create remote source connection to Oracle database
CREATE REMOTE SOURCE "ORACLE_ERP"
  ADAPTER "OracleAdapter"
  CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
    <ConnectionProperties name="ORACLE_ERP">
      <PropertyEntry name="host">oracle-prod.company.com</PropertyEntry>
      <PropertyEntry name="port">1521</PropertyEntry>
      <PropertyEntry name="service">ORCL</PropertyEntry>
      <PropertyEntry name="user">HANA_USER</PropertyEntry>
      <PropertyEntry name="password" encrypt="true">****</PropertyEntry>
    </ConnectionProperties>'
  WITH CREDENTIAL TYPE 'PASSWORD'
  USING 'user=HANA_USER;password=****';

Step 2: Create Virtual Table

-- Create virtual table pointing to Oracle table
CREATE VIRTUAL TABLE "VIRTUAL_SALES"
  AT "ORACLE_ERP"."<oracle_schema>"."SALES_DATA";

-- Grant access
GRANT SELECT ON "VIRTUAL_SALES" TO _SYS_REPO;

Step 3: Create Calculation View on Virtual Table

-- Calculation View: CV_SALES_ORACLE
Projection Node: VIRTUAL_SALES
  - SALES_ORDER_ID
  - CUSTOMER_ID
  - ORDER_DATE
  - AMOUNT

Aggregation Node:
  - GROUP BY: CUSTOMER_ID, ORDER_DATE
  - SUM(AMOUNT) as TOTAL_SALES

Calculated Columns:
  - TAX = AMOUNT * 0.18
  - NET_AMOUNT = AMOUNT + TAX

Step 4: Use in BW/4HANA (Open ODS View)

Open ODS View: OOV_SALES_ORACLE
  Type: Virtual
  Source: Calculation View CV_SALES_ORACLE
  
  Fields:
    - 0CUSTOMER (mapped to CUSTOMER_ID)
    - 0CALDAY (mapped to ORDER_DATE)
    - 0AMOUNT (mapped to TOTAL_SALES)
  
  Usage:
    - Can be used in Composite Providers
    - No data loaded into BW
    - Real-time queries to Oracle

Supported Data Sources (SDA)

Category	Sources
Relational Databases	Oracle, MS SQL Server, DB2, MySQL, PostgreSQL, Sybase, Teradata
SAP Systems	SAP HANA, SAP IQ, MaxDB
Big Data	Hadoop (Hive), Spark SQL, Google BigQuery
Cloud Databases	Amazon Redshift, Azure SQL, Snowflake
NoSQL	MongoDB (limited)

Smart Data Integration (SDI) Implementation

Architecture Overview

SAP HANA System
  ├── SDI Data Provisioning Agent (DP Agent)
  │   ├── Installed on separate server
  │   ├── Connects to source systems
  │   └── Manages data flow
  │
  ├── Flowgraphs (Data Flows)
  │   ├── Source → Transformations → Target
  │   └── Scheduled or real-time
  │
  └── Target Tables in HANA
      └── Replicated/Transformed Data

Step 1: Install Data Provisioning Agent

# On Windows/Linux server
1. Download DP Agent from SAP Software Center
2. Extract and run installer
   ./hdbsetup --batch --configfile=dpagent_config.cfg

3. Register agent with HANA
   hdbinst --register_dpagent 
     --host=<hana_host> 
     --port=<hana_port>
     --agent_host=<agent_host>
     --agent_port=5050

4. Verify connection
   SELECT * FROM M_AGENTS;

Step 2: Create Remote Source (SDI)

-- Create remote source for MySQL database
CREATE REMOTE SOURCE "MYSQL_CRM"
  ADAPTER "MySQLAdapter"
  AT LOCATION AGENT "DPAgent_001"
  CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
    <ConnectionProperties name="MYSQL_CRM">
      <PropertyEntry name="host">mysql-prod.company.com</PropertyEntry>
      <PropertyEntry name="port">3306</PropertyEntry>
      <PropertyEntry name="database">crm_db</PropertyEntry>
      <PropertyEntry name="user">hana_etl</PropertyEntry>
      <PropertyEntry name="password" encrypt="true">****</PropertyEntry>
    </ConnectionProperties>'
  WITH CREDENTIAL TYPE 'PASSWORD'
  USING 'user=hana_etl;password=****';

Step 3: Create Replication Task (Batch Load)

-- Create replication task
CREATE REPLICATION TASK "RT_CUSTOMER_LOAD"
  SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
  TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REPLICA"
  WITH
    PARAMETERS (
      'execution_mode' = 'BATCH',
      'batch_size' = '10000',
      'truncate_target' = 'TRUE'
    )
  SCHEDULE 'CRON:0 2 * * *'; -- Daily at 2 AM

-- Start the task
ALTER REPLICATION TASK "RT_CUSTOMER_LOAD" START;

Step 4: Real-Time Replication with CDC

-- Enable CDC on source table (MySQL example)
-- On source database:
ALTER TABLE CUSTOMERS ADD COLUMN _cdc_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

-- In HANA:
CREATE REPLICATION TASK "RT_CUSTOMER_REALTIME"
  SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
  TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REALTIME"
  WITH
    PARAMETERS (
      'execution_mode' = 'REALTIME',
      'cdc_mode' = 'TIMESTAMP_BASED',
      'cdc_column' = '_cdc_timestamp',
      'poll_interval' = '10' -- seconds
    );

-- Monitor replication
SELECT 
  TASK_NAME,
  SOURCE_LOCATION,
  TARGET_TABLE_NAME,
  LAST_EXECUTION_TIME,
  ROWS_REPLICATED,
  STATUS
FROM M_REPLICATION_TASKS;

Step 5: Flowgraph for Complex ETL

-- Flowgraph: Transform and Load Customer Data
Flowgraph Name: FG_CUSTOMER_ETL

Node 1: Source (MySQL CUSTOMERS table)
  SELECT 
    customer_id,
    first_name,
    last_name,
    email,
    created_date
  FROM CUSTOMERS
  WHERE status = 'ACTIVE'

Node 2: Transformation
  - Calculated Column: full_name = first_name || ' ' || last_name
  - Filter: created_date >= ADD_YEARS(CURRENT_DATE, -5)
  - Lookup: Map region based on email domain

Node 3: Target (HANA table)
  UPSERT INTO HANA_SCHEMA.DIM_CUSTOMER
  KEY (customer_id)

Scheduling: Daily at 3 AM
Error Handling: Log to error table, continue on row errors

Real-World Use Cases

Use Case 1: Real-Time Sales Dashboard (SDA)

Scenario: Retail chain needs live sales from 200 stores (SQL Server databases)

Solution Architecture:
  
1. Create 200 remote sources (one per store)
   
2. Create virtual tables for each store's SALES table
   
3. Create union Calculation View:
   CV_ALL_STORES_SALES
     UNION ALL
       Store_001.SALES
       Store_002.SALES
       ...
       Store_200.SALES
   
4. Create Open ODS View in BW/4HANA
   OOV_REALTIME_SALES → CV_ALL_STORES_SALES
   
5. Build Composite Provider for reporting
   CP_SALES_REALTIME
     - Combines OOV_REALTIME_SALES with master data ADSOs
   
Result:
  ✅ Real-time dashboard (< 5 second latency)
  ✅ No data replication overhead
  ✅ Minimal HANA storage usage
  ⚠️ Query performance depends on store network

Use Case 2: Historical Data Warehouse (SDI)

Scenario: Manufacturing company consolidating 10 years of production data from Oracle, SAP ECC, and MongoDB

Solution Architecture:
  
1. Install DP Agent on dedicated ETL server
   
2. Create remote sources:
   - ORACLE_PRODUCTION
   - SAP_ECC
   - MONGODB_IOT
   
3. Create replication tasks:
   a) Oracle: Batch load (daily full refresh)
      RT_PRODUCTION_DATA → HANA_SCHEMA.PROD_DATA
      
   b) SAP ECC: Real-time CDC (log-based)
      RT_ECC_ORDERS → HANA_SCHEMA.ORDERS
      
   c) MongoDB: Flowgraph with transformation
      FG_IOT_SENSORS → HANA_SCHEMA.SENSOR_DATA
      (Flatten JSON, calculate aggregates)
   
4. Load into BW/4HANA ADSOs:
   - ADSO_PRODUCTION
   - ADSO_ORDERS
   - ADSO_IOT
   
5. Create Composite Provider for unified view
   
Result:
  ✅ Consolidated 10-year history in HANA
  ✅ Fast queries (HANA-optimized)
  ✅ Change tracking for audit trails
  ✅ Complex transformations applied

Use Case 3: Hybrid Approach (SDA + SDI)

Scenario: Financial services company needs both real-time market data and historical performance

Architecture:
  
Real-Time Layer (SDA):
  - Virtual access to live market data feed (PostgreSQL)
  - Open ODS View: OOV_MARKET_LIVE
  - Updated every 5 seconds
  
Historical Layer (SDI):
  - Daily batch load from same PostgreSQL
  - ADSO_MARKET_HISTORY (10 years retained)
  - Loaded at market close (6 PM daily)
  
Unified View (Composite Provider):
  CP_MARKET_ANALYSIS
    UNION
      OOV_MARKET_LIVE (today's data)
      ADSO_MARKET_HISTORY (historical data)
  
Query Logic:
  - If date = CURRENT_DATE → Query OOV (real-time)
  - If date < CURRENT_DATE → Query ADSO (historical)
  
Result:
  ✅ Real-time for intraday analysis
  ✅ Historical for trend analysis
  ✅ Optimized storage (no duplication)
  ✅ Fast queries (HANA for historical)

Performance Optimization

SDA Performance Best Practices

Practice	Description	Impact
Push-down filters	Ensure WHERE clauses execute on source	High
Limit columns	Select only needed fields	Medium
Use indexes	Create indexes on source tables	High
Network optimization	Use fast, dedicated network	Critical
Caching	Enable result cache in HANA	Medium
Avoid joins	Don't join virtual + HANA tables	High

SDI Performance Best Practices

Batch size tuning: Optimize based on network and target system load (5000-50000 rows)
Parallel loads: Use multiple DP agents for large volumes
CDC over full refresh: Reduces data transfer and load time
Compression: Enable network compression for remote agents
Partitioning: Partition target tables by date/key for faster loads
Monitoring: Use M_REPLICATION_STATISTICS for bottleneck analysis

Security Considerations

SDA Security

1. Credential Management
   - Use encrypted passwords in remote source definitions
   - Store credentials in secure credential store
   - Rotate passwords regularly

2. Network Security
   - Use VPN/private network for remote connections
   - Encrypt connections (SSL/TLS)
   - Whitelist HANA IPs on source firewalls

3. Access Control
   - Grant minimal privileges on source databases
   - Use dedicated service accounts (not personal accounts)
   - Audit virtual table access in HANA

4. Data Masking
   - Mask sensitive fields in Calculation Views
   - Use HANA Dynamic Data Masking for PII

SDI Security

1. Agent Security
   - Install DP Agent in DMZ or secure network zone
   - Use firewall rules to restrict agent access
   - Secure agent-to-HANA communication (SSL)

2. Data in Transit
   - Enable encryption for all replication tasks
   - Use secure protocols (HTTPS, SFTP, etc.)
   - Compress and encrypt flowgraph data

3. Audit Trails
   - Log all replication activities
   - Monitor M_REPLICATION_AUDIT_LOG
   - Alert on failed replications

Troubleshooting Common Issues

SDA Issues

Issue	Cause	Solution
Slow queries	No push-down to source	Check explain plan, add indexes
Connection timeout	Network/firewall	Verify connectivity, adjust timeout
Wrong data types	Type mapping issues	Explicit casting in Calculation View
Authentication failed	Credential expiry	Update remote source credentials

SDI Issues

Issue	Cause	Solution
Replication lag	High source load	Increase batch size, add agents
Agent disconnected	Network/resource issues	Restart agent, check logs
Data inconsistency	CDC configuration	Verify CDC column, reset baseline
Out of memory	Large batch size	Reduce batch size, increase agent RAM

Decision Framework

When to Use SDA

✅ Need real-time data (< 1 minute latency)
✅ Low data volume queries
✅ Want to avoid data duplication
✅ Stable, fast network to source
✅ Limited HANA storage available
❌ Avoid for: Complex transformations, high-volume queries, slow sources

When to Use SDI

✅ Need historical data retention
✅ Complex ETL transformations required
✅ High query volumes expected
✅ Source performance concerns
✅ Change data capture needed
❌ Avoid for: Simple real-time lookups, limited HANA storage

Monitoring and Maintenance

SDA Monitoring

-- Check virtual table statistics
SELECT 
  TABLE_NAME,
  REMOTE_SOURCE_NAME,
  RECORD_COUNT,
  LAST_ACCESS_TIME
FROM M_TABLES
WHERE IS_VIRTUAL_TABLE = 'TRUE';

-- Monitor query performance
SELECT 
  STATEMENT_STRING,
  EXECUTION_TIME,
  REMOTE_SOURCE_NAME
FROM M_SQL_PLAN_CACHE
WHERE STATEMENT_STRING LIKE '%VIRTUAL_%'
ORDER BY EXECUTION_TIME DESC;

SDI Monitoring

-- Check replication status
SELECT 
  TASK_NAME,
  STATUS,
  LAST_EXECUTION_TIME,
  ROWS_REPLICATED,
  ERROR_MESSAGE
FROM M_REPLICATION_TASKS;

-- Monitor agent health
SELECT 
  AGENT_NAME,
  HOST,
  STATUS,
  LAST_HEARTBEAT_TIME
FROM M_AGENTS;

-- Replication performance
SELECT 
  TASK_NAME,
  AVG(DURATION_MS) as AVG_DURATION,
  SUM(ROWS_AFFECTED) as TOTAL_ROWS
FROM M_REPLICATION_STATISTICS
GROUP BY TASK_NAME;

Conclusion

SDA and SDI are complementary technologies that, when used correctly, enable comprehensive external data integration in BW/4HANA. Choose based on your specific requirements:

SDA for real-time, on-demand access without duplication
SDI for robust, performant data warehousing with transformations
Hybrid for the best of both worlds

The right integration strategy isn't about choosing one over the other — it's about using each where it excels.