About
Results-driven Data Engineer with 5+ years of hands-on experience and 6+ years of overall professional experience. Expert in PySpark, SQL, and Azure Data Factory (ADF), with a focus on building cost-efficient ETL/ELT pipelines within the Databricks environment.
Core Expertise
- Performance Engineering: Eliminating data skewness and slow filters through Z-Order, Liquid Clustering, and Data Skipping techniques.
- Spark Mastery: Optimizing query execution via Adaptive Query Execution (AQE), Broadcast Joins, and Partition Tuning.
- Scalable Architecture: Designing Medallion Architectures and end-to-end orchestration for high-velocity, reliable data.
A self-motivated problem solver dedicated to squeezing every bit of performance out of Spark to deliver high-velocity, actionable intelligence.
Skills
Python
PySpark
SQL
DataBricks
Azure Data Factory
Azure Data Lake
Delta Table
Experience
Tredence Analytics
Senior Data Engineer
28 Mar 2024 - Present
- Led the end-to-end migration of Enterprise Master and Transactional Data from Teradata to Microsoft Azure, ensuring data integrity and alignment with Cloud Architecture best practices for high availability and performance.
- Engineered automated Azure Data Factory (ADF) ingestion and transformation pipelines for complex CPG data, utilizing Delta Lake and Medallion Architecture (Bronze, Silver, Gold) to provide curated, high performance datasets.
- Developed a Python-based Automation Tool to programmatically convert Teradata DDL into Databricks compatible notebooks, reducing manual development time by 60% and enforcing cross-team standardization.
- Engineered a Modular Framework of reusable Python functions for common operations (e.g., Surrogate Key generation, Business Key hashing), significantly increasing code consistency and development efficiency.
- Optimized Databricks notebooks for high-performance processing by implementing Z-Ordering and Liquid Clustering to accelerate data skipping.
- Implemented Dynamic SCD Type 1, Type 2, and Fact Processing logic within Databricks, enhancing Data Accuracy and reducing operational overhead by 50%.
- Developed and Optimized PySpark notebooks in Databricks, implementing advanced Data Cleansing, Schema Validation, and Performance Tuning to transition data from Bronze to Silver layers for analytics ready reporting.
- Spearheaded the data modeling phase by translating complex functional requirements into comprehensive Source-to-Target Mapping (STTM) documents for enterprise-scale datasets.
R1RCM
Data Engineer
10 Nov 2023 - 27 Mar 2024
- Collecting and standardizing patient data from healthcare facilities, categorizing accounts into bad debt, zero balance, or payment dues.
- Managing the end-to-end recovery process by analyzing and securing payments from insurance companies and patients.
- Orchestrating and monitoring intricate workflows using Airflow to ensure pipeline reliability.
- Optimizing Snowflake data warehousing for high-performance storage, retrieval, and advanced analytics.
- Developing robust pipelines through expert-level Python programming and SQL script execution.
- Leveraging PySpark for scalable big data processing and complex data analysis.
Infosys Limited
Data Engineer
17 Feb 2022 - 08 Nov 2023
- Led data refinement projects to deliver precise sales insights through thorough statistical analysis (mean, median, mode) and rigorous data cleaning.
- Designed and implemented efficient ETL processes to extract, transform, and load data into Azure data warehouses.
- Demonstrated proficiency in SQL and Spark for querying and manipulating data within the data warehouse environment.
- Played a key role in the development and management of end-to-end data pipelines in Azure Data Factory, ensuring smooth data flow from source to destination.
- Applied industry best practices for data warehouse design, including dimensional modeling and schema optimization, to enhance storage and retrieval efficiency.
Takhil Technologies
Data Engineer
01 Feb 2021 - 15 Feb 2022
- Developed ETL workflows using PySpark and SQL to process structured data and load it into data lake/storage systems for reporting and analysis.
- Assisted in monitoring and troubleshooting data pipelines, ensuring data quality and timely data delivery for business use cases.
Pratham Education Foundation
Trainer
01 Dec 2019 - 12 Nov 2020
- Worked as Electrical Trainer and given training about Assistant Electrician (NSQF Level- 3) to
- underprivileged students.
Coding Achievements
Contact
Get in touch at surajkr1000000@gmail.com
Call me at +91-8210891342