Big Data Spark Developer with Python

Job Details

Role: Big Data Spark Developer with Python

Location: HP Vancouver, Washington, US

Years of experience needed: 10+ years

Technical Skills:

Design and implement distributed data processing pipelines using Spark, Python, SQL and other tools and languages prevalent in the Big Data/Lakehouse ecosystem.
Analyzes design and determines coding, programming, and integration activities required based on general objectives.
Play the technical lead role representing deliverables from vendor team resources at onsite and offshore locations.
Lead the technical co-ordination and Business Knowledge transition activities to offshore team.
Reviews and evaluates designs and project activities for compliance with architecture, security and quality guidelines and standards
Writes and executes complete testing plans, protocols, and documentation for assigned portion of data system or component; identifies defects and creates solutions for issues with code and integration into data system architecture.
Collaborates and communicates with project team regarding project progress and issue resolution.
Works with the data engineering team for all phases of larger and more-complex development projects and engages with external users on business and technical requirements.
Collaborates with peers, engineers, data scientists and project team.
Typically interacts with high-level Individual Contributors, Managers and Program Teams on a daily/weekly basis.

Certifications Needed:

Bachelor's or Master's degree in Computer Science, Information Systems, Engineering or equivalent.
6+ years of relevant experience with detailed knowledge of data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools.
3+ years of experience with Cloud based DW such as Redshift, Snowflake etc.
3+ years experience in Big Data Distributed ecosystems (Hadoop, SPARK, Unity Catalog & Delta Lake)
3+ years experience in Workflow orchestration tools such as Airflow etc.
3+ years experience in Big Data Distributed systems such as Databricks, AWS EMR, AWS Glue etc.
Leverage monitoring tools/frameworks, like Splunk, Grafana, CloudWatch etc.
Experience with container management frameworks such as Docker, Kubernetes, ECR etc.
3+ year s working with multiple Big Data file formats (Parquet, Delta Lake)
Experience working on CI/CD processes such as Jenkins, Codeway etc. and source control tools such as GitHub, etc.
Strong experience in coding languages like Python, Scala & Java