Hi, I'm Suraj Kumar Soni.

Data Engineer | ETL/ELT • Big Data • Cloud Data Platforms.

About Me

About

Results-driven Data Engineer with 5+ years of hands-on experience and 6+ years of overall professional experience. Expert in PySpark, SQL, and Azure Data Factory (ADF), with a focus on building cost-efficient ETL/ELT pipelines within the Databricks environment.

Core Expertise

A self-motivated problem solver dedicated to squeezing every bit of performance out of Spark to deliver high-velocity, actionable intelligence.

Skills

Python PySpark SQL DataBricks Azure Data Factory Azure Data Lake Delta Table

Experience

Tredence Analytics

Senior Data Engineer
28 Mar 2024 - Present
  • Led the end-to-end migration of Enterprise Master and Transactional Data from Teradata to Microsoft Azure, ensuring data integrity and alignment with Cloud Architecture best practices for high availability and performance.
  • Engineered automated Azure Data Factory (ADF) ingestion and transformation pipelines for complex CPG data, utilizing Delta Lake and Medallion Architecture (Bronze, Silver, Gold) to provide curated, high performance datasets.
  • Developed a Python-based Automation Tool to programmatically convert Teradata DDL into Databricks compatible notebooks, reducing manual development time by 60% and enforcing cross-team standardization.
  • Engineered a Modular Framework of reusable Python functions for common operations (e.g., Surrogate Key generation, Business Key hashing), significantly increasing code consistency and development efficiency.
  • Optimized Databricks notebooks for high-performance processing by implementing Z-Ordering and Liquid Clustering to accelerate data skipping.
  • Implemented Dynamic SCD Type 1, Type 2, and Fact Processing logic within Databricks, enhancing Data Accuracy and reducing operational overhead by 50%.
  • Developed and Optimized PySpark notebooks in Databricks, implementing advanced Data Cleansing, Schema Validation, and Performance Tuning to transition data from Bronze to Silver layers for analytics ready reporting.
  • Spearheaded the data modeling phase by translating complex functional requirements into comprehensive Source-to-Target Mapping (STTM) documents for enterprise-scale datasets.

R1RCM

Data Engineer
10 Nov 2023 - 27 Mar 2024
  • Collecting and standardizing patient data from healthcare facilities, categorizing accounts into bad debt, zero balance, or payment dues.
  • Managing the end-to-end recovery process by analyzing and securing payments from insurance companies and patients.
  • Orchestrating and monitoring intricate workflows using Airflow to ensure pipeline reliability.
  • Optimizing Snowflake data warehousing for high-performance storage, retrieval, and advanced analytics.
  • Developing robust pipelines through expert-level Python programming and SQL script execution.
  • Leveraging PySpark for scalable big data processing and complex data analysis.

Infosys Limited

Data Engineer
17 Feb 2022 - 08 Nov 2023
  • Led data refinement projects to deliver precise sales insights through thorough statistical analysis (mean, median, mode) and rigorous data cleaning.
  • Designed and implemented efficient ETL processes to extract, transform, and load data into Azure data warehouses.
  • Demonstrated proficiency in SQL and Spark for querying and manipulating data within the data warehouse environment.
  • Played a key role in the development and management of end-to-end data pipelines in Azure Data Factory, ensuring smooth data flow from source to destination.
  • Applied industry best practices for data warehouse design, including dimensional modeling and schema optimization, to enhance storage and retrieval efficiency.

Takhil Technologies

Data Engineer
01 Feb 2021 - 15 Feb 2022
  • Developed ETL workflows using PySpark and SQL to process structured data and load it into data lake/storage systems for reporting and analysis.
  • Assisted in monitoring and troubleshooting data pipelines, ensuring data quality and timely data delivery for business use cases.

Pratham Education Foundation

Trainer
01 Dec 2019 - 12 Nov 2020
  • Worked as Electrical Trainer and given training about Assistant Electrician (NSQF Level- 3) to
  • underprivileged students.

Coding Achievements

Contact

Get in touch at surajkr1000000@gmail.com

Call me at +91-8210891342