b Description: Data Engineer PySpark
Exp: 3-8yrs
Location: Pune, Indore
Interview Date: 29th Jan
Role Overview
- A Data Engineer with PySpark expertise is responsible for designing, building, and optimizing scalable data pipelines for large-scale data processing. The role involves working with distributed systems, big data technologies, and cloud platforms to enable analytics and data-driven decision-making.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark
- Process large volumes of structured and semi-structured data
- Optimize Spark jobs for performance, scalability, and reliability
- Develop batch and streaming data processing solutions
- Integrate data from databases, APIs, files, and event streams
- Collaborate with analytics and data science teams
- Prepare technical documentation and participate in code reviews
Required Skills & Qualifications
- Strong hands-on experience with PySpark
- Proficiency in Python programming
- Strong understanding of Apache Spark architecture
- Advanced SQL skills
- Experience with ETL / ELT frameworks
- Hands-on experience with big data technologies
Good to Have Skills
- Databricks platform experience
- Apache Kafka or other streaming technologies
- Airflow or similar workflow orchestration tools
- CI/CD pipelines for data engineering
- Knowledge of data modeling and data warehousing
- Exposure to cloud data platforms such as Snowflake
Cloud & Platform Experience
- Experience with AWS, Azure, or GCP
- Hands-on experience with EMR, Azure Databricks, Synapse, or similar services
- Working knowledge of cloud storage (S3, ADLS, GCS)
Experience Requirements
- 3?8+ years of overall experience in Data Engineering
- 2+ years of hands-on PySpark development
- Experience working with large and complex datasets
Soft Skills
- Strong analytical and problem-solving skills
- Good communication and stakeholder management
- Ability to work in an Agile environment
- Strong ownership and attention to detail