Azure Data Factory LIVE Online Training

This impeccable Azure Data Factory Training course is carefully designed for aspiring ETL Developers and Architects. This Azure Data Factory Training includes basic to advanced ETL Concepts, Data Warehouse (DWH) and Data Mashups / Data Flow concepts using SQL Server, Azure SaaS Components. This Azure Data Factory Training course also includes Azure SQL Database Migrations, Azure Storage, Azure Data Warehouse (ADW), Incremental Loads, Power Query, Azure Data Lake required for Big Data Analytics and Warehouse design with ONE Real-time Project.

Azure Data Factory Training Plans

  PLAN A PLAN B PLAN C
  1. Azure Data Factory &
Synapse Analytics
1. SQL Server TSQL
2. Azure Data Factory
1. SQL Server TSQL
2. Azure Data Engineer
Total Duration 3 Weeks 6 Weeks 10 Weeks
ADF : Azure Data Factory
ADF : Data Imports, ETL
ADF : Data Flows, Wrangling
ADF : Transformations, ETL
Synapse: Configuration, Loads
Synapse: ETL with ADF, DWH
Synapse: MPP, cDWH, DIUs
TSQL: Database Basics, T-SQL
TSQL : Constraints, Joins, Queries
TSQL: Views, Group By, Self Joins
TSQL: DB Objects, Queries
TSQL: Transactions, Lock Hints
Storage: ADLS Gen 2, BLOB
Storage: Az Tables, Shares, ACL
Azure Stream Analytics & Jobs
IoT Hubs and Event Hubs, ETL
ADB : Azure Data Bricks
ADB : Architecture, Data Loads
ADB : Run Spark Jobs, Pools
ADB : Workspace, Delta Tables
DP 203 Exams Guidance
Total Course Fee
( Payable in Installments)*
INR 15000
USD 200*
INR 19000
USD 300*
INR 39000
USD 400*

Trainer: Mr. Sai Phanindra T

SQL Server & T-SQL Schedules
S No Time (IST, Mon - Fri) Start Date  
1 6 AM - 7 AM Apr 2nd Register
2 8 AM - 9 AM Mar 27th Register
3 9 AM - 10 AM Mar 19th Register
4 6 PM - 7 PM Feb 27th Register
5 8 PM - 9 PM Mar 12th Register
6 9 PM - 10 PM Mar 5th Register
Azure Data Factory Training Schedules
S No Time (IST, Mon - Fri) Start Date  
1 7 AM - 8 AM Feb 27th Register
2 7 PM - 8 PM Mar 5th Register

If above schedule does not work, opt for Azure Synapse Training Videos

Azure Data Factory Training Highlights :

✔ Azure Data Factory
✔ Azure Databricks
✔ Azure Synapse
✔ Azure Cosmos DB
✔ ADF Resources, Monitor
✔ Power Query in ADF
✔ Azure Storage Explorer
✔ Azure Data Explorer
✔ On-Premise Migrations
✔ Big Data Storage
✔ Performance Tuning
✔ Security Management
✔ Prepping, Ingestions
✔ Spark Clusters, Python, Scala

All Session Are Completely Practical & Real Time

 

Azure Data factory Training Course Contents:

Ch 1: DATABASE INTRODUCTION

  • Databases Introduction & Purpose
  • Database Types : OLTP, DWH, OLAP
  • Microsoft SQL Server Advantages, Use
  • SQL Server Components and Usage
  • Microsoft SQL Server - Career Options
  • Developer, DBA, Data Engineer
  • Data Analyst, Data Scientist Careers
  • SQL : Purpose, Real-time Usage Options
  • SQL Versus Microsoft T-SQL [MSSQL]
  • Course Plan, Real-time Project, Resume
  • 24 x 7 Online Lab for Remote DB Access
  • Versions and Editions of SQL Server
  • SQL Server Pre-requisites : S/W, H/W
  • System Configuration Checker Tool

Ch 5: SQL Basics - 3, TSQL INTRO

  • Database Objects : Tables and Schemas
  • Schemas : Group Tables in Database
  • Schemas : Security Management Object
  • Creating Schemas & Batch Concept
  • Using Schemas for Table Creation
  • Data Storage in Tables with Schemas
  • Data Retrieval & Usage with Schemas
  • Table Migrations across Schemas
  • Import and Export Wizard in SSMS
  • Data Imports with Excel File Data
  • Performing Bulk Operations in SSMS
  • Temporary Tables : Real-time Use
  • Local and Global Temporary Tables
  • # and ## Prefix, Scope of Usage

Ch 9: Functions, Procedures Basics

  • Functions with SQL Server, TSQL
  • Scalar, Inline, Table Functions
  • Variables: Declare, Real-time Use
  • Creating, Executing Functions
  • Functions for Computations
  • Functions for Parameterized Joins
  • Procedures: Usage in Real-time
  • Using Parameters in SQL Server
  • Parameterized Joins in TSQL
  • Compilation with Stored Procedures
  • sp_help, sp_helptext, sp_helpindex
  • sp_helpdb, sp_rename, sp_recompile
  • System Views For Metadata Audits
  • DBID, DBName, ObjectID, ObjectName

Ch 2: SQL SERVER INSTALLATION

  • SQL Server & SSMS Installation Plan
  • SQL Server Pre-requisites : S/W, H/W
  • SQL Server 2022 & 2019 Installation
  • Database Engine Feature, OLTP
  • Instances : Types and Properties
  • Default Instance, Named Instances
  • Service and Service Account Use
  • Authentication Modes and Logins
  • Windows Logins and SQL Logins
  • SQL Server Management Studio
  • Server Connections with SSMS Tool
  • Local and Remote Connections
  • System Databases: Master and Model
  • MSDB, TempDB, Resource Databases

Ch 6: Constraints, Index Basics

  • Constraints and Keys - Data Integrity
  • NULL, NOT NULL Property on Tables
  • UNIQUE KEY Constraints: Importance
  • PRIMARY KEY Constraint: Importance
  • FOREIGN KEY Constraint: Importance
  • REFERENCES, CHECK & DEFAULT
  • Candidate Keys and Identity Property
  • Database Diagrams and ER Models
  • Relationships Verification and Links
  • Indexes : Basic Types and Creation
  • Index Sorting and Search Advantages
  • Clustered and NonClustered Indexes
  • Primary Key and Unique Key Indexes
  • Need for Indexes - working with Keys

Ch 10: TRIGGERS & TRANSACTIONS

  • Triggers - Purpose, Real-world Usage
  • FOR/AFTER Triggers - Real time Use
  • INSTEAD OF Triggers - Real time Use
  • INSERTED, DELETED Memory Tables
  • Using Triggers for Data Replication
  • Enable Triggers and Disable Triggers
  • Database Level, Server Level Triggers
  • Transactions : Types, ACID Properties
  • Transaction Types and AutoCommit
  • EXPLICIT & IMPLICIT Transactions
  • COMMIT and ROLLBACK Statements
  • Batch Concept and Go Statement
  • Open Transactions in Real-time
  • Using Conditional Commits, Rollbacks

Ch 3: SSMS Tool, SQL BASICS - 1

  • Creating Databases: Files [MDF, LDF]
  • Creating Tables in User Interface
  • Data Insertion & Report in User Interface
  • SQL : Purpose and Real-time Usage
  • SQL Versus T-SQL : Basic Differences
  • DDL, DML, SELECT, DCL and TCL
  • Creating SSMS Sessions : SPID
  • Create, Connect Databases using SQL
  • Creating Tables with INT, CHAR
  • Data Storage, Inserts - Basic Level
  • Table Data Verifications with Select
  • SELECT Statement for Table Retrieval
  • Identify Databases and Tables
  • Identify Sessions and Session ID

Ch 7: Joins Basics, TSQL Queries

  • JOINS - Table Comparisons Queries
  • INNER JOINS For Matching Data
  • OUTER JOINS For (non) Match Data
  • Join Queries with "ON" Conditions
  • Left Outer Joins - Example Queries
  • Right Outer Joins - Example Queries
  • FULL Outer Joins: Realtime Scenarios
  • CROSS JOIN and CROSS APPLY
  • One-way, Two way Data Comparisons
  • Using Table Aliases & Column Aliases
  • Optimizing Join Queries with Indexes
  • Choosing Correct Comparison Columns
  • Joining Unrelated Tables in TSQL
  • Self References, Self Joins in TSQL

Ch 11:  Normal Forms, Cursors

  • First Normal Form and Atomicity
  • Third Normal Form and MVD Property
  • Boycee-Codd Normal Form : BNCF
  • Fourth Normal Form : Advantages
  • Self Reference Keys and 4 NF Usage
  • 1:1, 1:M, M:1, M:M Relationship Types
  • Linked Servers Configurations, RPC
  • Linked Servers, Remote Joins in TSQL
  • 2 Part, 3 Part, 4 Part Naming Styles
  • Remote Joins Queries and Aliases
  • Cursors - Basics, Data Operations
  • Cursors - Life Cycle & Declaration
  • Cursors Types, FETCH Operations
  • Cursors - Deallocate, Real-world Use

Ch 4: SQL BASICS - 2

  • Creating Tables: VARCHAR, FLOAT
  • Single Row Inserts, Multi Row Inserts
  • Rules for Data Insertion Statements
  • SELECT with WHERE Conditions
  • AND and OR Operators Usage
  • IN Operator and NOT IN Operator
  • Between, Not Between Operators
  • LIKE and NOT LIKE Operators
  • ORDER BY, TOP & OFFSET
  • Basic Sub Queries with SELECT
  • UPDATE Statement & Conditions
  • DELETE & TRUNCATE Statements
  • ALTER, ADD COLUMN Statements
  • DROP Statements: Table, Database

Ch 8: Group By in TSQL, Views Basics

  • GROUP BY: Importance, Realtime Use
  • GROUP BY Queries and Aggregations
  • Group By Queries with Having Clause
  • Group By Queries with Where Clause
  • Using WHERE and HAVING in T-SQL
  • Group By with Joins in TSQL
  • Query Execution Order & Aliases
  • Joins with Sub Queries, Formatting
  • Database Objects: Overview & Usage
  • Views: Types, Usage in Real-time
  • Creating, Executing & Verifying Views
  • Storing Queries in Database Views
  • Excel Analytics - Joins & Views
  • Excel Office Data Connection Reports

Ch 12: TSQL Queries, SQL Analytics

  • IIF() Function with SELECT Query
  • WHEN..THEN..ELSE
  • WHEN MATCHED, NOT MATCHED
  • Incremental Loads, Upsert Statement
  • Stored Procedures: Merge Statement
  • UNION and UNION ALL Operator
  • Window Functions: Rank, Dense Rank
  • Row_Number, PartitionBy in TSQL
  • Duplicate Row Identification, Deletion
  • Grouping, Cube, Rollup, Lag, Lead
  • Data Types: Numerical, Date, Time
  • Data Types: Characters, Real, Float
  • Date & Time Functions, DateAdd
  • String Functions, Concat, SubString
Case Study 1: Database Design with Tables,
Constraints, Keys & Relations
Case Study 2: Joins with Group By,
Sub Queries, Views, Excel Analytics

Module 1

Module 2

Module 3

Chapter 1: Cloud Basics, Azure SQL

  • Cloud Introduction and Azure Basics
  • Azure Implementation: IaaS, PaaS, SaaS
  • Azure Data Engineer: Job Roles
  • Azure Storage Components
  • Azure ETL & Streaming Components
  • Need for Azure Data Factory (ADF)
  • Need for Azure Synapse Analytics
  • Azure Resources and Resource Types
  • Azure Account, Subscription (Free)
  • Azure SQL Server [Logical Server]
  • Firewall Rules and Azure Services
  • Azure SQL Database Deployment
  • Azure SQL Pool Deployment
  • Compute: DTU Versus DWU
  • Test Connections from SSMS

Chapter 5: Incremental Loads with ADF

  • Incremental Loads with Files (BLOB)
  • Pipeline Executions and Schedules
  • Regular Schedules and Tumbling Window
  • Execution Retry and Delay Options
  • Binary Copy, Last Modified Date in Blob
  • Automated Loops and Trigger Schedules
  • Incremental Loads Verification Tests
  • Incompatible Rows Skips, Fault Tolerance
  • Database Tables : Incremental Loads
  • Copy Method : UPSERT, Business Keys
  • ETL Staging Advantages & Performance
  • ADF Pipelines: Execution Settings
  • ADF Logging Options, Consistency Check
  • Compression Option, DOP and DOCP
  • ADF Pipeline Triggers and Monitoring

Chapter 9: Synapse Analytics with Spark

  • Synapse Pipelines: Performance Advantages
  • Pivot Transformation For Normalization
  • Generating Pivot Column, Aggregations
  • Pivot Transformation and Pivot Settings
  • Pivot Key Selection, Value and Nulls
  • Pivoted Columns and Column Pattern
  • Column Prefix, Help Graphic & Metadata
  • Denormalized Data and Aggregations
  • Apache Spark Pool in Azure Synapse
  • Spark Cluster Nodes: Vcores, Memory
  • Notebooks : Purpose, Usage Options
  • Python Notebooks For Remote Access
  • Creating Databases in Apache Spark Pool
  • Data Loads from Dedicated SQL Pools
  • PySpark Code for Data Operations, Writes

Chapter 2: Synapse SQL Pools (DWH)

  • Dedicated SQL Pools in Azure
  • Enterprise Data Warehouse with Synapse
  • Massively Parallel Processing (MPP)
  • Control Nodes and Compute Nodes
  • DMS: Data Movement Service
  • Start/Resume/Pause & Scaling
  • SQL Pool Config @ TSQL Scripts
  • Start/Resume/Pause, Scaling Options
  • Table Creations @ TSQL Scripts
  • Table Partitions: Left & Right
  • Distributions: Round Robin, Hash
  • Distributions: Replicate and Usage
  • Auto Indexing & Column Store
  • Planning for Big Data Loads
  • Need for ADF: Azure Data Factory

Chapter 6: ADF Data Flow - 1

  • Data Flow Task, Data Flow Activity
  • Transformations with Data Flow
  • Spark Cluster For Debugging
  • Cluster Node Configurations
  • Spark Cluster Types & Sizing
  • Transaction Optimized - Capacity
  • Memory Optimized - Capacity
  • Data Cleansing with ADF
  • Data Orchestration with Data Flow
  • SELECT Transformation & Options
  • Conditional Split Transformation
  • UNION, SELECT Transformation
  • Spark Cluster For Pipeline Executions
  • Pipeline Monitoring & Run IDs
  • Adding Data Flow into Pipelines

Chapter 10: Synapse Security & Parameters

  • Azure Active Directory (AAD) Users, Groups
  • IAM: Identity & Access Management
  • Synapse Workspace Security with RBAC
  • ADF Security with RBAC: Owner, Contributor
  • Azure Synapse SQL Pool Security: Logins
  • Creating SQL Logins & Users : master
  • SQL Users in Azure SQL DB and SQL Pool
  • Grant, Control, Revoke: Security Roles
  • Parameters - Creation and Use in Pipelines
  • Dynamic Connections with Credentials
  • User Name and Password Connectivity
  • Dynamic Dataset Configurations
  • Pipeline Expressions with Parameters
  • Resource Classes and Usage with SQL Pool

Chapter 3: Azure Data Factory Concepts

  • Azure Data Factory (ADF) Concepts
  • Hybrid Data Integration at Scale
  • ADF Pipelines : Architecture
  • Integration Runtime (IR) & Use
  • Linked Services and Datasets
  • Pipeline Design: Activities
  • Copy Data Tool, Data Flow
  • Pipeline Triggers and Schedules
  • ADF Pipeline with Copy Data Tool
  • Azure SQL DB to Synapse Data Loads
  • Working with Multi Tables Data Loads
  • Creating Linked Services, Datasets
  • Basic Data Loads : Publish, Trigger
  • Copy Method : Bulk Insert
  • DIU : Data Integration Units

Chapter 7: ADF Data Flow - 2

  • ADF Pipelines For ETL Operations
  • Data Flow Tasks and Activities in Synapse
  • JOIN & EXISTS Transformations
  • Aggregate & Group By Transformations
  • Window Functions & Rank in Data Flow
  • Rank / DenseRank / Row Number
  • Derived Column Transformation
  • Lookup, Surrogate Key, Parse
  • Type Convert, Cast Transformations
  • Reusing Data Flow Tasks in Synapse
  • Pipeline Validations & Executions
  • Inline Datasets, Schema Drift
  • Data Deduplication with ADF
  • DFT Optimization Techniques
  • Data Flow Task - Staging, Logging

Chapter 11:  Change Data Capture (CDC)

  • Change Data Capture (CDC) Data Loads
  • Incremental Loads with CDC Types
  • SQL Server CDC : ETL Load Dates
  • Run Mode Options and CDC Types
  • Output Pipeline Expression, Data Window
  • Azure SQL DB Destinations, Watermarks
  • JSON Parameters, Pipeline Scheduling
  • Pipeline Validation, Trigger, Monitoring
  • Synapse SQL Pool : Data Loads (DWH)
  • ETL Optimization Techniques
  • SQL Pool (Synapse) Optimizations
  • Pipeline Optimization Techniques

Chapter 4: OnPremise Data Loads

  • Copy Data Tool For ETL Operations
  • On-Premise Data Sources with Azure
  • Self Hosted Integration Runtime (IR)
  • Access Keys, Remote Linked Services
  • Synapse SQL Pool (DW) with On-Premise
  • Staged Data Copy and Performance
  • Pipeline Executions and Monitoring
  • Pipeline RunIDs and Audits / Tracing
  • Creating Azure Storage Account
  • Storage Container, BLOB File Uploads
  • DIU Allocations and Concurrency
  • Pipeline Trigger, Author and Monitor
  • Staging with Storage Account, Container
  • Polybase For Azure Synapse, Advantages
  • Pipeline Execution: DIU & DOCP

Chapter 8: Azure Synapse Analytics

  • Azure Synapse Analytics Resource
  • Azure Synapse Analytics Workspace
  • Managed Resource Group, SQL Account
  • Synapse Workspace & Synapse Studio
  • Operations with Synapse Workspace
  • ADLS Gen 2 Storage Account, Container
  • Synapse Studio: Scripts & Pipelines
  • Dedicated SQL Pools : Creation, Use
  • Synapse Tables, Data Loads with TSQL
  • COPY INTO Statements with T-SQL
  • Row Terminator and Compressions
  • T-SQL Queries and Aggregations
  • Aggregation Data Loads in Synapse
  • Creating Synapse Pipelines with TSQL
  • Stored Procedure Activity & Triggers

Chapter 12:  Pipeline Monitoring, Security

  • Azure Monitor Resource and Usage
  • Pipeline Monitoring Techniques
  • ADF: Pipeline Monitoring and Alerts
  • Synapse: Pipeline Monitoring and Alerts
  • Synapse: Storage Monitoring and Alerts
  • Conditions, Signal Rules and Metrics
  • Email Notifications with Azure
  • Serverless Pool in Azure Synapse
  • Connections, Usage with Serverless Pool
  • Using Azure OpenDatasets in Synapse
  • OPENROWSET and BULK Data Loads
  • Azure Storage Account : Data Analysis
  • Working with Parquet Files in Synapse
  • Python Notebooks (Pyspark) in Synapse

Mod 1: Azure Data Factory & Synapse Analytics

Mod 2: Azure Data Lake Storage & Stream Analytics

Mod 3: Azure Databricks & Spark, Python

Chapter 1: Cloud Basics, Azure SQL

  • Cloud Introduction and Azure Basics
  • Azure Implementation: IaaS, PaaS, SaaS
  • Azure Data Engineer: Job Roles
  • Azure Storage Components
  • Azure ETL & Streaming Components
  • Need for Azure Data Factory (ADF)
  • Need for Azure Synapse Analytics
  • Azure Resources and Resource Types
  • Azure Account, Subscription (Free)
  • Azure SQL Server [Logical Server]
  • Firewall Rules and Azure Services
  • Azure SQL Database Deployment
  • Azure SQL Pool Deployment
  • Compute: DTU Versus DWU
  • Test Connections from SSMS

Chapter 1: Azure Fundamentals - Storage

  • Azure Fundamentals: Storage Components
  • Azure Storage Resources & Usage
  • Resource Groups & Subscriptions
  • Azure Storage : Files, Tables and ETL
  • Azure Storage Account & Use
  • Data Lake Storage Account (ADLS)
  • Advanced Options: HNS Property
  • Resource Location, Resource Group
  • Azure Portal: Deployment Verifications
  • Azure Portal: Deployment Verifications
  • Storage Account : Basic Properties
  • Overview Page: Status, HNS State
  • Azure Storage : Access Options
  • Azure Storage Explorer Tool
  • Explorer Tool : Configuration
  • Azure Subscription : Filter Options

Chapter 1: Azure Intro, Azure Databricks

  • Azure Cloud : SaaS, PaaS, PaaS & IaaS
  • Azure Cloud : Storage, ETL Resources
  • Azure Databricks : Compute Resources
  • Need for Azure Databricks (ADB)
  • Azure Databricks : Purpose & Config
  • Azure Databricks Service Creation
  • Azure Databricks Components
  • Azure Databricks Workspace, Usage
  • Spark Cluster Configurations, Capacity
  • Driver Nodes, Worker Nodes in Spark
  • Cluster Types : Personal, Unrestricted
  • CPU, Memory & IO Resources
  • Virtual Machines (VM) for Clusters
  • Databricks : Runtime & DBFS Storage
  • DBFS : Files, Tables with Spark DB

Chapter 2: Synapse SQL Pools (DWH)

  • Dedicated SQL Pools in Azure
  • Enterprise Data Warehouse with Synapse
  • Massively Parallel Processing (MPP)
  • Control Nodes and Compute Nodes
  • DMS: Data Movement Service
  • Start/Resume/Pause & Scaling
  • SQL Pool Config @ TSQL Scripts
  • Start/Resume/Pause, Scaling Options
  • Table Creations @ TSQL Scripts
  • Table Partitions: Left & Right
  • Distributions: Round Robin, Hash
  • Distributions: Replicate and Usage
  • Auto Indexing & Column Store
  • Planning for Big Data Loads
  • Need for ADF: Azure Data Factory

Chapter 2:  Azure Storage Operations

  • BLOB: Binary Large Objects
  • Storage Browser and Service Pages
  • Storage Browser: Container Creation
  • Storage Browser: Folder, File Uploads
  • Service Page: Container Creation
  • Service Page: Folder, File Uploads
  • Container, Folder, File Properties
  • Limitations with Storage Portal
  • Azure Data Explorer Tool : Usage
  • Container: Creation, Properties
  • File Uploads, Edits and Access URLs
  • Azure Storage Explorer Tool Usage
  • Azure Account Options in Explorer
  • Directory Creation, File Operations
  • Limitations with Explorer Tool

Chapter 2: SparkDatabase, SQL Notebooks

  • DBFS : File Uploads from ON-Premise
  • Creating Spark Tables; Spark DB
  • Data Explorer: HIVE Metastore
  • Data Explorer: Spark Database, Tables
  • Notebooks: SQL, Python and Scala
  • Creating SQL Notebooks in Databricks
  • Creating User Defined Spark Databases
  • Connecting / Using Spark Databases
  • Spark SQL : Big Data Loads
  • Spark SQL : Database & Table List
  • Spark SQL : Data Aggregations, Jobs
  • Spark SQL : Data Analytics, Reports
  • Spark SQL Limitations : Python, Scala
  • Notebooks : Export, Import, Clone
  • Notebooks : Storage & Versions

Chapter 3: Azure Data Factory Concepts

  • Azure Data Factory (ADF) Concepts
  • Hybrid Data Integration at Scale
  • ADF Pipelines : Architecture
  • Integration Runtime (IR) & Use
  • Linked Services and Datasets
  • Pipeline Design: Activities
  • Copy Data Tool, Data Flow
  • Pipeline Triggers and Schedules
  • ADF Pipeline with Copy Data Tool
  • Azure SQL DB to Synapse Data Loads
  • Working with Multi Tables Data Loads
  • Creating Linked Services, Datasets
  • Basic Data Loads : Publish, Trigger
  • Copy Method : Bulk Insert
  • DIU : Data Integration Units

Chapter 3:  Azure Storage Security, ACLs

  • Azure Data Lake Storage Security Options
  • Shared Access Keys: Primary, Secondary
  • SAS Key Generation: Container, Tables, Files
  • SAS Key Permissions, Validation Options
  • Access Keys: Account Level Permissions
  • Azure Active Directory: Users, Groups
  • Azure AD Security: RBAC with IAM, ACLs
  • Owner Role, Contributor and Reader Role
  • Azure Data Lake Storage Security Options
  • ACL : Access Control Lists & Security
  • Azure BLOB Storage Containers & ACLs
  • Folder Level and File Level Security
  • ACL Permissions: Read, Write & Execute
  • Access Policy: Creation and Realtime Use
  • Permissions: rwacdl; Azure Principals, CORS

Chapter 3: Python Intro, Spark SQL

  • Python : Introduction, Real-time Use
  • Python : ETL and DWH Operations
  • Python : Data Engineering Operations
  • Python : Data Frames & Pandas
  • Python : Spark SQL Integrations
  • Python : Spark Databases, Tables
  • Python : Data Inserts, Retrievals
  • Python : DataFrames, In-Memory
  • PySpark : Introduction, Options
  • PySpark : Relation with Python
  • PySpark : Real-time Usage
  • PySpark : Table Clone & Imports
  • PySpark : Renames, Column
  • PySpark : Metadata Functions
  • PySpark : Pandas & Jupyter

Chapter 4: OnPremise Data Loads

  • Copy Data Tool For ETL Operations
  • On-Premise Data Sources with Azure
  • Self Hosted Integration Runtime (IR)
  • Access Keys, Remote Linked Services
  • Synapse SQL Pool (DW) with On-Premise
  • Staged Data Copy and Performance
  • Pipeline Executions and Monitoring
  • Pipeline RunIDs and Audits / Tracing
  • Creating Azure Storage Account
  • Storage Container, BLOB File Uploads
  • DIU Allocations and Concurrency
  • Pipeline Trigger, Author and Monitor
  • Staging with Storage Account, Container
  • Polybase For Azure Synapse, Advantages
  • Pipeline Execution: DIU & DOCP

Chapter 4:  SQL Database Migrations

  • SQL Server (On-Premise) to Azure Migration
  • Using SSMS Tool, SQL Database Installation
  • Source Database Scripts & Validations
  • BACPAC File Generation From SSMS Tool
  • Table Selection & Advanced Options
  • Azure Data Lake Storage and SSMS Access
  • Azure Storage Container, BACPAC Files
  • IAM and Account Key Authentication
  • Azure SQL Server Creation From Portal
  • Azure SQL Database Deployment
  • DTU : Data Transaction Units, Pricing
  • Azure Firewall Configuration, Security
  • Azure SQL Database Imports (bacpac)
  • Azure SQL Server with ADLS Containers
  • Azure SQL DB Migrations, Verification

Chapter 4: PySpark Notebooks, DBFS

  • Spark Notebooks with Python
  • Reading DBFS Data into Spark
  • Creating Dataframes for ETL
  • Spark Functions and Usage
  • Temporary Views & Dataframes
  • Spark Temp Views: Aggregations
  • Spark Data Loads with Temp Views
  • Data Explorer: HIVE & Spark DB
  • Spark Executors and Spark Jobs
  • Read() Function and Usage
  • createTempTable
  • wirte.format()
  • Parquet Tables with Spark DB
  • Reading Spark Tables, Data
  • Analytics: X, Y Axis, Group By

Chapter 5: Incremental Loads with ADF

  • Incremental Loads with Files (BLOB)
  • Pipeline Executions and Schedules
  • Regular Schedules and Tumbling Window
  • Execution Retry and Delay Options
  • Binary Copy, Last Modified Date in Blob
  • Automated Loops and Trigger Schedules
  • Incremental Loads Verification Tests
  • Incompatible Rows Skips, Fault Tolerance
  • Database Tables : Incremental Loads
  • Copy Method : UPSERT, Business Keys
  • ETL Staging Advantages & Performance
  • ADF Pipelines: Execution Settings
  • ADF Logging Options, Consistency Check
  • Compression Option, DOP and DOCP
  • ADF Pipeline Triggers and Monitoring

Chapter 5:  Azure Tables, Key Vaults

  • Azure Tables - Real-time Usage
  • Schema-less Design and Access Options
  • Structured and Relational Data Storage
  • Tables, Entities and Properties Concepts
  • Azure Tables: Creation and Data Inserts
  • Azure Tables in Portal - GUI, Data Types
  • Azure Tables: Data Imports in Explorer
  • Data Edits, Queries & Delete Operations
  • Azure Key Vaults, ADLS [Data Lake] Security
  • Azure Passwords, Keys and Certificates
  • Azure Key Vaults - Name and Vault URI
  • Inbuilt Managed Key and Azure Key Vault
  • Key Vaults Types: Standard & Premium
  • Secret Page, Key Backups, Key Restores
  • Add Keys to Azure Vaults. Key Type, Size

Chapter 5: PySpark with ADLS, Widgets

  • Azure Storage Account : Creation
  • HNS Property : ADLS, Containers
  • File Path URL Configuration: WASBS
  • Access Key and SAS Key Generation
  • Databricks : Data Import Scripts
  • Config Options with ADLS, Spark
  • PySpark Variables & Data Loads
  • Data Explorer: HIVE & Spark DB
  • Widgets : Notebook Parameters
  • widget module : Text, Combo
  • Dropdown, Multi Select Parameters
  • dbutils help(), get() & remove()
  • Dataframes, Spark SQL @ Variables
  • Python Data Frames, Spark SQL
  • Dynamic Spark SQL & F Strings

Chapter 6: ADF Data Flow - 1

  • Data Flow Task, Data Flow Activity
  • Transformations with Data Flow
  • Spark Cluster For Debugging
  • Cluster Node Configurations
  • Spark Cluster Types & Sizing
  • Transaction Optimized - Capacity
  • Memory Optimized - Capacity
  • Data Cleansing with ADF
  • Data Orchestration with Data Flow
  • SELECT Transformation & Options
  • Conditional Split Transformation
  • UNION, SELECT Transformation
  • Spark Cluster For Pipeline Executions
  • Pipeline Monitoring & Run IDs
  • Adding Data Flow into Pipelines

Chapter 6:  Replication & Optimizations

  • Azure Storage: Replications, DR Options
  • LRS: Locally Redundant Storage
  • GRS: Globally Redundant Storage
  • ZRS: Zone Redundant Storage
  • Replication Options and Advantages
  • Replication Verification, Modifications
  • Storage Endpoints, Failover Partner
  • Azure Monitor, Metrics & Activity Logs
  • Monitoring Azure Storage Namespaces
  • Add KQL Metrics; Account, Blob and File
  • Total Ingress and Egress Metrics: Charts
  • Average Latency, Transaction Count
  • Request Breakdowns, Signal Logic
  • Azure Alerts & Conditions, Notifications
  • Signal Logic Conditions and Emails

Chapter 6: PySpark Jobs, Architecture

  • Driver Nodes, Worker Nodes, DBUs
  • RDD : Resilent Data Distribution
  • DAG : Directed Acyclic Graph
  • Hadoop HDES and Spot Instance
  • Cluster Manager, Master Node
  • RDDS, Worker, Excecutor & Slave
  • Hadoop HDES & Databricks Runtime
  • Databricks Optimization Techniques
  • Spot Instance, Photon Acceleration
  • All Purpose Cluster, Job Cluster
  • Databricks Jobs: Creation & Tasks
  • Jobs with Parameters, Executions
  • Task Dependency & Notifications
  • Continuous Schedule, Manual Schedule
  • Active Jobs, Recently Run Jobs, Monitor

Chapter 7: ADF Data Flow - 2

  • ADF Pipelines For ETL Operations
  • Data Flow Tasks and Activities in Synapse
  • JOIN & EXISTS Transformations
  • Aggregate & Group By Transformations
  • Window Functions & Rank in Data Flow
  • Rank / DenseRank / Row Number
  • Derived Column Transformation
  • Lookup, Surrogate Key, Parse
  • Type Convert, Cast Transformations
  • Reusing Data Flow Tasks in Synapse
  • Pipeline Validations & Executions
  • Inline Datasets, Schema Drift
  • Data Deduplication with ADF
  • DFT Optimization Techniques
  • Data Flow Task - Staging, Logging

Chapter 7: Azure Stream Analytics, IoT

  • Azure Stream Analytics Real-time Usage
  • Real-time Data Processing, Events
  • Ingest, Deliver and Analysis Operations
  • Azure Stream Analytics Jobs Concept
  • Understanding Input & Output Options
  • SAQL Queries for Stream Analytics Jobs
  • IoT: Internet Of Things, Real-time Data
  • Need for IoT Hubs and Event Hubs
  • Conditional Split Transformation
  • Creating IoT Device for Data Inputs
  • Creating Azure Stream Analytics Job
  • Stream Analytics for Historical Data
  • Azure SQL Database for ASA Jobs
  • SAQL: Query Formatting, Validation
  • Historical Data Upload, ASA Jobs
  • Stream Analytics Job Monitoring

Chapter 7: Scala with Spark, Azure SQL DB

  • Azure Databricks Security Operations
  • Azure Active Directory (Azure AD)
  • AD Users and RBAC with IAM
  • Owner, Contributor & Reader Roles
  • Workspace Admin Permissions
  • Notebook Permissions & Share
  • Workflow Security, HTTP Path
  • User Tokens & ServerName
  • Scala : Differences with PySpark
  • Scala : Variables Declaration, Usage
  • SparkSQL with Scala Notebooks
  • Temp Views with Scala Notebooks
  • Aggregations with Scala Notebooks
  • Visual Data Analytics with Scala
  • PySpark to Scala Conversions

Chapter 8: Azure Synapse Analytics

  • Azure Synapse Analytics Resource
  • Azure Synapse Analytics Workspace
  • Managed Resource Group, SQL Account
  • Synapse Workspace & Synapse Studio
  • Operations with Synapse Workspace
  • ADLS Gen 2 Storage Account, Container
  • Synapse Studio: Scripts & Pipelines
  • Dedicated SQL Pools : Creation, Use
  • Synapse Tables, Data Loads with TSQL
  • COPY INTO Statements with T-SQL
  • Row Terminator and Compressions
  • T-SQL Queries and Aggregations
  • Aggregation Data Loads in Synapse
  • Creating Synapse Pipelines with TSQL
  • Stored Procedure Activity & Triggers

Chapter 8: Azure Event Hubs

  • Azure Stream Analytics For API Data
  • IoT Hubs & IoT Devices, Connection Strings
  • Rasberry APP Connections with IoT Hub
  • Azure Storage Account and Container
  • Creating Azure Stream Analytics Job
  • Configuring Input Aliases with IoT Hub
  • Output Aliases with ADLS Gen 2
  • SAQL Query, Job Executions; Monitoring
  • Azure Event Hubs and Event Instances
  • Event Hub Namespaces, Partition Counts
  • Access Policies, Permissions & Defaults
  • RootManageSharedAccessKey & Options
  • Connection Strings & Event Service Bus
  • Telco App Installation, Executions. LIVE Data
  • On-Premise App Integration with ASA Jobs

Chapter 8: Scala with Spark, Azure SQL DB

  • Data Imports with Azure SQL DB
  • Using Scala for Big Data Loads
  • Spark SQL Queries @ Temp Views
  • Variables, display(), read()
  • Scala Transformations, display()
  • JSON, AVRO and DBFS Mounts
  • azure.sas.container @ ADLS
  • write.jdbc() & JVM
  • JDBC Connection, DataframeWriter
  • JDBC Properties, Port Settings
  • Data Extraction, SQLContext
  • Spark Context and Spark Session
  • SQLServerDriver with Scala
  • Pandas Data Frame Vs Scala
  • Compare Python with Scala

Chapter 9: Synapse Analytics with Spark

  • Synapse Pipelines: Performance Advantages
  • Pivot Transformation For Normalization
  • Generating Pivot Column, Aggregations
  • Pivot Transformation and Pivot Settings
  • Pivot Key Selection, Value and Nulls
  • Pivoted Columns and Column Pattern
  • Column Prefix, Help Graphic & Metadata
  • Denormalized Data and Aggregations
  • Apache Spark Pool in Azure Synapse
  • Spark Cluster Nodes: Vcores, Memory
  • Notebooks : Purpose, Usage Options
  • Python Notebooks For Remote Access
  • Creating Databases in Apache Spark Pool
  • Data Loads from Dedicated SQL Pools
  • PySpark Code for Data Operations, Writes

Chapter 9: Storage Architecture, Queues

  • Azure Storage Account : Architecture
  • Block Blob, Append Blob, Page Blob
  • Cold & Hot Access Tiers with ADLS
  • Archive Mode : Real-time Use
  • Access Policy : Creation, Usage
  • Legal Hold & Time Bound Access
  • Pricing : Security, Encryption
  • Pricing : HNS and Replication
  • Azure File Share Service (Files)
  • Mounting Files From On-Premise
  • SMB File Share: Hot, Optimized
  • Azure Queue Service & Messages
  • Message Queues: Operations
  • Storage Explorer Tool with Shares
  • Azure Storage Services: ETL Needs

Chapter 9: DeltaLake Incr Loads, DWH

  • Azure DeltaLake Implementation
  • ACID Properties, Upsert Advantages
  • Delta Engine Optimizations & Uses
  • Pipeline Creation: JSON Files in DBFS
  • Delta Tables Creation, Data Loads
  • Spark Cluster Settings: Auto Optimize
  • Auto Compact, Delta Table Optimize
  • JSON Files, Delta Streaming Location
  • Joins and Merge with Delta Tables
  • Incremental Loads, Delta Tables
  • Create & Use DWH with Databricks
  • Spark Database Versus DWH
  • Big Data Integrations with Spark
  • Databricks with Data Factory (ADF)
  • End to End Implementations

Chapter 10: Synapse Security & Parameters

  • Azure Active Directory (AAD) Users, Groups
  • IAM: Identity & Access Management
  • Synapse Workspace Security with RBAC
  • ADF Security with RBAC: Owner, Contributor
  • Azure Synapse SQL Pool Security: Logins
  • Creating SQL Logins & Users : master
  • SQL Users in Azure SQL DB and SQL Pool
  • Grant, Control, Revoke: Security Roles
  • Parameters - Creation and Use in Pipelines
  • Dynamic Connections with Credentials
  • User Name and Password Connectivity
  • Dynamic Dataset Configurations
  • Pipeline Expressions with Parameters
  • Resource Classes and Usage with SQL Pool

Real-Time Project

  • Online Retail Database Data Source
  • Azure Migrations and ETL Concepts
  • Azure SQL Pool (Synapse DWH) Tables
  • Apache Spark Pool : Databases, Tables
  • Azure Data Lake Storage (ADLS Gen 2)
  • Azure Stream Analytics Jobs with IoT
  • Azure Data Bricks and DBFS, Notebooks
  • Lookup Activity in ADF & Broacasts
  • Parameters with Dynamic Datasets
  • Watermark Columns & Procedures
  • Copy Data Activity & Parameters
  • Data Delta with JSON Values
  • Pipeline Parameters & ETL Concepts
  • Dataset Parameters, Table Names

 

  • Using ADLS with Spark Databases
  • Aggregations with Big Data Loads
  • Parameterized ETL Sources
  • Parameterized Spark Tables
  • Widgets and Workflows
  • Python Notebooks to Scala
  • Azure SQL DB Connections
  • Project Requirement
  • Project Solution
  • Project FAQs
  • Concept wise FAQs
  • Resume Guidance
  • Mock Interviews
  • DP 203 Certification Guidance

Chapter 11:  Change Data Capture (CDC)

  • Change Data Capture (CDC) Data Loads
  • Incremental Loads with CDC Types
  • SQL Server CDC : ETL Load Dates
  • Run Mode Options and CDC Types
  • Output Pipeline Expression, Data Window
  • Azure SQL DB Destinations, Watermarks
  • JSON Parameters, Pipeline Scheduling
  • Pipeline Validation, Trigger, Monitoring
  • Synapse SQL Pool : Data Loads (DWH)
  • ETL Optimization Techniques
  • SQL Pool (Synapse) Optimizations
  • Pipeline Optimization Techniques

Chapter 12:  Pipeline Monitoring, Security

  • Azure Monitor Resource and Usage
  • Pipeline Monitoring Techniques
  • ADF: Pipeline Monitoring and Alerts
  • Synapse: Pipeline Monitoring and Alerts
  • Synapse: Storage Monitoring and Alerts
  • Conditions, Signal Rules and Metrics
  • Email Notifications with Azure
  • Serverless Pool in Azure Synapse
  • Connections, Usage with Serverless Pool
  • Using Azure OpenDatasets in Synapse
  • OPENROWSET and BULK Data Loads
  • Azure Storage Account : Data Analysis
  • Working with Parquet Files in Synapse
  • Python Notebooks (Pyspark) in Synapse
 

Job-Oriented Real-time Training @ SQL School Training Institute

SQL Server T-SQL, Azure SQL, Azure DBA, Azure BI, Azure Data Engineer, Power BI Training

For latest schedules Click Here
 

Trending Jobs

 

Certification Trainings

Other Courses

 
For latest schedules Click Here