? Strong hands-on experience in Python programming and PySpark
? Experience using AWS services (RedShift, Glue, EMR, S3 & Lambda)
? Experience working with Apache Spark and Hadoop ecosystem.
? Experience in writing and optimizing SQL for data manipulations
? Good Exposure to scheduling tools. Airflow is preferable.
? Must Have Data Warehouse Experience with AWS Redshift or Hive
? Experience in implementing security measures for data protection.
? Expertise to build/test complex data pipelines for ETL processes (batch and near real time)
? Readable documentation of all the components being developed.
? Knowledge of Database technologies for OLTP and OLAP workloads