External Data Integration in BW/4HANA: SDI and SDA Masterclass
Modern data warehouses must integrate data from diverse sources — legacy databases, cloud platforms, big data systems, and external APIs. SAP HANA provides two powerful approaches: Smart Data Integration (SDI) andSmart Data Access (SDA).
This guide will help you understand when to use each approach, how to implement them, and best practices for production-grade external data integration in BW/4HANA.
Understanding the Fundamentals
What is Smart Data Access (SDA)?
SDA enables virtual access to external data sources without physically replicating data into HANA. It's like querying remote databases as if they were local HANA tables.
Key Characteristics
- Real-time data access (no latency)
- No data duplication
- Federated query execution
- Supports 40+ data sources
- Use cases: Real-time dashboards, data validation
What is Smart Data Integration (SDI)?
SDI is a comprehensive ETL framework for loading, transforming, and synchronizing data from external sources into HANA. It supports batch loads, real-time replication, and change data capture (CDC).
Key Characteristics
- Data replication into HANA
- Batch and real-time modes
- Built-in transformations
- Change data capture (CDC)
- Use cases: Data warehousing, historical analysis
SDA vs SDI: Quick Comparison
| Aspect | SDA (Smart Data Access) | SDI (Smart Data Integration) |
|---|---|---|
| Data Location | Stays in source system | Copied to HANA |
| Latency | Real-time (0 seconds) | Batch/near real-time |
| Performance | Depends on source | HANA-optimized |
| Storage | No HANA storage used | Uses HANA storage |
| Transformations | Limited (view-level) | Full ETL capabilities |
| Historical Data | Not persisted | Persisted in HANA |
| CDC Support | No | Yes |
| Complexity | Low (simple setup) | High (full ETL design) |
| Cost | No HANA storage cost | HANA storage cost |
Smart Data Access (SDA) Implementation
Step 1: Create Remote Source
-- Create remote source connection to Oracle database
CREATE REMOTE SOURCE "ORACLE_ERP"
ADAPTER "OracleAdapter"
CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
<ConnectionProperties name="ORACLE_ERP">
<PropertyEntry name="host">oracle-prod.company.com</PropertyEntry>
<PropertyEntry name="port">1521</PropertyEntry>
<PropertyEntry name="service">ORCL</PropertyEntry>
<PropertyEntry name="user">HANA_USER</PropertyEntry>
<PropertyEntry name="password" encrypt="true">****</PropertyEntry>
</ConnectionProperties>'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=HANA_USER;password=****';Step 2: Create Virtual Table
-- Create virtual table pointing to Oracle table CREATE VIRTUAL TABLE "VIRTUAL_SALES" AT "ORACLE_ERP"."<oracle_schema>"."SALES_DATA"; -- Grant access GRANT SELECT ON "VIRTUAL_SALES" TO _SYS_REPO;
Step 3: Create Calculation View on Virtual Table
-- Calculation View: CV_SALES_ORACLE Projection Node: VIRTUAL_SALES - SALES_ORDER_ID - CUSTOMER_ID - ORDER_DATE - AMOUNT Aggregation Node: - GROUP BY: CUSTOMER_ID, ORDER_DATE - SUM(AMOUNT) as TOTAL_SALES Calculated Columns: - TAX = AMOUNT * 0.18 - NET_AMOUNT = AMOUNT + TAX
Step 4: Use in BW/4HANA (Open ODS View)
Open ODS View: OOV_SALES_ORACLE
Type: Virtual
Source: Calculation View CV_SALES_ORACLE
Fields:
- 0CUSTOMER (mapped to CUSTOMER_ID)
- 0CALDAY (mapped to ORDER_DATE)
- 0AMOUNT (mapped to TOTAL_SALES)
Usage:
- Can be used in Composite Providers
- No data loaded into BW
- Real-time queries to OracleSupported Data Sources (SDA)
| Category | Sources |
|---|---|
| Relational Databases | Oracle, MS SQL Server, DB2, MySQL, PostgreSQL, Sybase, Teradata |
| SAP Systems | SAP HANA, SAP IQ, MaxDB |
| Big Data | Hadoop (Hive), Spark SQL, Google BigQuery |
| Cloud Databases | Amazon Redshift, Azure SQL, Snowflake |
| NoSQL | MongoDB (limited) |
Smart Data Integration (SDI) Implementation
Architecture Overview
SAP HANA System
├── SDI Data Provisioning Agent (DP Agent)
│ ├── Installed on separate server
│ ├── Connects to source systems
│ └── Manages data flow
│
├── Flowgraphs (Data Flows)
│ ├── Source → Transformations → Target
│ └── Scheduled or real-time
│
└── Target Tables in HANA
└── Replicated/Transformed DataStep 1: Install Data Provisioning Agent
# On Windows/Linux server
1. Download DP Agent from SAP Software Center
2. Extract and run installer
./hdbsetup --batch --configfile=dpagent_config.cfg
3. Register agent with HANA
hdbinst --register_dpagent
--host=<hana_host>
--port=<hana_port>
--agent_host=<agent_host>
--agent_port=5050
4. Verify connection
SELECT * FROM M_AGENTS;Step 2: Create Remote Source (SDI)
-- Create remote source for MySQL database
CREATE REMOTE SOURCE "MYSQL_CRM"
ADAPTER "MySQLAdapter"
AT LOCATION AGENT "DPAgent_001"
CONFIGURATION '<?xml version="1.0" encoding="UTF-8"?>
<ConnectionProperties name="MYSQL_CRM">
<PropertyEntry name="host">mysql-prod.company.com</PropertyEntry>
<PropertyEntry name="port">3306</PropertyEntry>
<PropertyEntry name="database">crm_db</PropertyEntry>
<PropertyEntry name="user">hana_etl</PropertyEntry>
<PropertyEntry name="password" encrypt="true">****</PropertyEntry>
</ConnectionProperties>'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=hana_etl;password=****';Step 3: Create Replication Task (Batch Load)
-- Create replication task
CREATE REPLICATION TASK "RT_CUSTOMER_LOAD"
SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REPLICA"
WITH
PARAMETERS (
'execution_mode' = 'BATCH',
'batch_size' = '10000',
'truncate_target' = 'TRUE'
)
SCHEDULE 'CRON:0 2 * * *'; -- Daily at 2 AM
-- Start the task
ALTER REPLICATION TASK "RT_CUSTOMER_LOAD" START;Step 4: Real-Time Replication with CDC
-- Enable CDC on source table (MySQL example)
-- On source database:
ALTER TABLE CUSTOMERS ADD COLUMN _cdc_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
-- In HANA:
CREATE REPLICATION TASK "RT_CUSTOMER_REALTIME"
SOURCE "MYSQL_CRM"."crm_db"."CUSTOMERS"
TARGET TABLE "HANA_SCHEMA"."CUSTOMERS_REALTIME"
WITH
PARAMETERS (
'execution_mode' = 'REALTIME',
'cdc_mode' = 'TIMESTAMP_BASED',
'cdc_column' = '_cdc_timestamp',
'poll_interval' = '10' -- seconds
);
-- Monitor replication
SELECT
TASK_NAME,
SOURCE_LOCATION,
TARGET_TABLE_NAME,
LAST_EXECUTION_TIME,
ROWS_REPLICATED,
STATUS
FROM M_REPLICATION_TASKS;Step 5: Flowgraph for Complex ETL
-- Flowgraph: Transform and Load Customer Data
Flowgraph Name: FG_CUSTOMER_ETL
Node 1: Source (MySQL CUSTOMERS table)
SELECT
customer_id,
first_name,
last_name,
email,
created_date
FROM CUSTOMERS
WHERE status = 'ACTIVE'
Node 2: Transformation
- Calculated Column: full_name = first_name || ' ' || last_name
- Filter: created_date >= ADD_YEARS(CURRENT_DATE, -5)
- Lookup: Map region based on email domain
Node 3: Target (HANA table)
UPSERT INTO HANA_SCHEMA.DIM_CUSTOMER
KEY (customer_id)
Scheduling: Daily at 3 AM
Error Handling: Log to error table, continue on row errorsReal-World Use Cases
Use Case 1: Real-Time Sales Dashboard (SDA)
Scenario: Retail chain needs live sales from 200 stores (SQL Server databases)
Solution Architecture:
1. Create 200 remote sources (one per store)
2. Create virtual tables for each store's SALES table
3. Create union Calculation View:
CV_ALL_STORES_SALES
UNION ALL
Store_001.SALES
Store_002.SALES
...
Store_200.SALES
4. Create Open ODS View in BW/4HANA
OOV_REALTIME_SALES → CV_ALL_STORES_SALES
5. Build Composite Provider for reporting
CP_SALES_REALTIME
- Combines OOV_REALTIME_SALES with master data ADSOs
Result:
✅ Real-time dashboard (< 5 second latency)
✅ No data replication overhead
✅ Minimal HANA storage usage
⚠️ Query performance depends on store networkUse Case 2: Historical Data Warehouse (SDI)
Scenario: Manufacturing company consolidating 10 years of production data from Oracle, SAP ECC, and MongoDB
Solution Architecture:
1. Install DP Agent on dedicated ETL server
2. Create remote sources:
- ORACLE_PRODUCTION
- SAP_ECC
- MONGODB_IOT
3. Create replication tasks:
a) Oracle: Batch load (daily full refresh)
RT_PRODUCTION_DATA → HANA_SCHEMA.PROD_DATA
b) SAP ECC: Real-time CDC (log-based)
RT_ECC_ORDERS → HANA_SCHEMA.ORDERS
c) MongoDB: Flowgraph with transformation
FG_IOT_SENSORS → HANA_SCHEMA.SENSOR_DATA
(Flatten JSON, calculate aggregates)
4. Load into BW/4HANA ADSOs:
- ADSO_PRODUCTION
- ADSO_ORDERS
- ADSO_IOT
5. Create Composite Provider for unified view
Result:
✅ Consolidated 10-year history in HANA
✅ Fast queries (HANA-optimized)
✅ Change tracking for audit trails
✅ Complex transformations appliedUse Case 3: Hybrid Approach (SDA + SDI)
Scenario: Financial services company needs both real-time market data and historical performance
Architecture:
Real-Time Layer (SDA):
- Virtual access to live market data feed (PostgreSQL)
- Open ODS View: OOV_MARKET_LIVE
- Updated every 5 seconds
Historical Layer (SDI):
- Daily batch load from same PostgreSQL
- ADSO_MARKET_HISTORY (10 years retained)
- Loaded at market close (6 PM daily)
Unified View (Composite Provider):
CP_MARKET_ANALYSIS
UNION
OOV_MARKET_LIVE (today's data)
ADSO_MARKET_HISTORY (historical data)
Query Logic:
- If date = CURRENT_DATE → Query OOV (real-time)
- If date < CURRENT_DATE → Query ADSO (historical)
Result:
✅ Real-time for intraday analysis
✅ Historical for trend analysis
✅ Optimized storage (no duplication)
✅ Fast queries (HANA for historical)Performance Optimization
SDA Performance Best Practices
| Practice | Description | Impact |
|---|---|---|
| Push-down filters | Ensure WHERE clauses execute on source | High |
| Limit columns | Select only needed fields | Medium |
| Use indexes | Create indexes on source tables | High |
| Network optimization | Use fast, dedicated network | Critical |
| Caching | Enable result cache in HANA | Medium |
| Avoid joins | Don't join virtual + HANA tables | High |
SDI Performance Best Practices
- Batch size tuning: Optimize based on network and target system load (5000-50000 rows)
- Parallel loads: Use multiple DP agents for large volumes
- CDC over full refresh: Reduces data transfer and load time
- Compression: Enable network compression for remote agents
- Partitioning: Partition target tables by date/key for faster loads
- Monitoring: Use M_REPLICATION_STATISTICS for bottleneck analysis
Security Considerations
SDA Security
1. Credential Management - Use encrypted passwords in remote source definitions - Store credentials in secure credential store - Rotate passwords regularly 2. Network Security - Use VPN/private network for remote connections - Encrypt connections (SSL/TLS) - Whitelist HANA IPs on source firewalls 3. Access Control - Grant minimal privileges on source databases - Use dedicated service accounts (not personal accounts) - Audit virtual table access in HANA 4. Data Masking - Mask sensitive fields in Calculation Views - Use HANA Dynamic Data Masking for PII
SDI Security
1. Agent Security - Install DP Agent in DMZ or secure network zone - Use firewall rules to restrict agent access - Secure agent-to-HANA communication (SSL) 2. Data in Transit - Enable encryption for all replication tasks - Use secure protocols (HTTPS, SFTP, etc.) - Compress and encrypt flowgraph data 3. Audit Trails - Log all replication activities - Monitor M_REPLICATION_AUDIT_LOG - Alert on failed replications
Troubleshooting Common Issues
SDA Issues
| Issue | Cause | Solution |
|---|---|---|
| Slow queries | No push-down to source | Check explain plan, add indexes |
| Connection timeout | Network/firewall | Verify connectivity, adjust timeout |
| Wrong data types | Type mapping issues | Explicit casting in Calculation View |
| Authentication failed | Credential expiry | Update remote source credentials |
SDI Issues
| Issue | Cause | Solution |
|---|---|---|
| Replication lag | High source load | Increase batch size, add agents |
| Agent disconnected | Network/resource issues | Restart agent, check logs |
| Data inconsistency | CDC configuration | Verify CDC column, reset baseline |
| Out of memory | Large batch size | Reduce batch size, increase agent RAM |
Decision Framework
When to Use SDA
- ✅ Need real-time data (< 1 minute latency)
- ✅ Low data volume queries
- ✅ Want to avoid data duplication
- ✅ Stable, fast network to source
- ✅ Limited HANA storage available
- ❌ Avoid for: Complex transformations, high-volume queries, slow sources
When to Use SDI
- ✅ Need historical data retention
- ✅ Complex ETL transformations required
- ✅ High query volumes expected
- ✅ Source performance concerns
- ✅ Change data capture needed
- ❌ Avoid for: Simple real-time lookups, limited HANA storage
Monitoring and Maintenance
SDA Monitoring
-- Check virtual table statistics SELECT TABLE_NAME, REMOTE_SOURCE_NAME, RECORD_COUNT, LAST_ACCESS_TIME FROM M_TABLES WHERE IS_VIRTUAL_TABLE = 'TRUE'; -- Monitor query performance SELECT STATEMENT_STRING, EXECUTION_TIME, REMOTE_SOURCE_NAME FROM M_SQL_PLAN_CACHE WHERE STATEMENT_STRING LIKE '%VIRTUAL_%' ORDER BY EXECUTION_TIME DESC;
SDI Monitoring
-- Check replication status SELECT TASK_NAME, STATUS, LAST_EXECUTION_TIME, ROWS_REPLICATED, ERROR_MESSAGE FROM M_REPLICATION_TASKS; -- Monitor agent health SELECT AGENT_NAME, HOST, STATUS, LAST_HEARTBEAT_TIME FROM M_AGENTS; -- Replication performance SELECT TASK_NAME, AVG(DURATION_MS) as AVG_DURATION, SUM(ROWS_AFFECTED) as TOTAL_ROWS FROM M_REPLICATION_STATISTICS GROUP BY TASK_NAME;
Conclusion
SDA and SDI are complementary technologies that, when used correctly, enable comprehensive external data integration in BW/4HANA. Choose based on your specific requirements:
- SDA for real-time, on-demand access without duplication
- SDI for robust, performant data warehousing with transformations
- Hybrid for the best of both worlds
The right integration strategy isn't about choosing one over the other — it's about using each where it excels.
