? 4+ years of relevant experience in Pyspark and Azure Databricks.
? Proficiency in integrating, transforming, and consolidating data from various structured and unstructured data sources.
? Good experience in SQL or native SQL query languages.
? Strong experience in implementing Databricks notebooks using Python.
? Good experience in Azure Data Factory, ADLS, Storage Services, Serverless architecture, Azure functions.
? Advanced working knowledge in building robust data pipelines using ADF
? Experience in building processes supporting data transformation, data structures, metadata, dependency and workload management.
? Exposure to any cloud based Datawarehouse like Snowflake.
? Exposure to basic issues in working within a Cloud (Azure) environment.
? Advanced working SQL knowledge and experience working with relational/non-Relational databases, query.
? Exposure to Jira and Confluence required.
? Experience in working in an Agile project management environment.
? A successful history of manipulating, processing and extracting value from large, disconnected datasets.
? Build data pipeline infrastructure for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, cloud based relational or non-relational databases and/or scripting languages like Perl/Python
? Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.