Responsibilities:
* Design, develop, and maintain data pipelines using Spark, Python, Scala, and Java.
* Write efficient and optimized SQL queries for data extraction, transformation, and loading (ETL) processes.
* Work with DataFrames to manipulate and analyze large datasets.
* Implement data storage and processing solutions using cloud technologies (preferably AWS or GCP).
* Build and maintain real-time data streaming pipelines using MSK/Kafka.
* Utilize S3 for data storage and retrieval.
* Work with data lake technologies like Iceberg.
* Ensure data quality, integrity, and security.
* Collaborate with data scientists and other engineers to understand data requirements and deliver solutions.
* Participate in code reviews and contribute to improving our development processes.
* Troubleshoot and resolve issues in data pipelines and back-end systems.
Qualifications:
* Bachelor's degree in Computer Science or a related field (or equivalent experience).
* 5+ years of experience in back-end development with a focus on data engineering.
Required Skills:
* Strong proficiency in Spark, Python, Scala, and Java.
* Expertise in SQL and working with relational databases.
* Experience with DataFrames for data manipulation and analysis.
* Experience with cloud technologies (preferably AWS or GCP).
* Experience with message streaming platforms like MSK/Kafka.
* Experience with S3 or similar object storage.
* Experience with data lake technologies like Iceberg.
* Solid understanding of data warehousing concepts.
* Excellent communication and collaboration skills.
Bonus Points:
* Experience with data modeling and schema design.
* Experience with data governance and data quality management.
* Experience with DevOps practices and CI/CD pipelines.
* Relevant certifications (e.g., AWS Certified Data Engineer, Google Cloud Certified Professional Data Engineer).