Skip to main content

Big Data Spark Developer with Python

Big Data Spark Developer with Python
Natsoft
10 months ago

Job Details

Role: Big Data Spark Developer with Python

Location: HP Vancouver, Washington, US

Years of experience needed: 10+ years

Technical Skills:

  • Design and implement distributed data processing pipelines using Spark, Python, SQL and other tools and languages prevalent in the Big Data/Lakehouse ecosystem.
  • Analyzes design and determines coding, programming, and integration activities required based on general objectives.
  • Play the technical lead role representing deliverables from vendor team resources at onsite and offshore locations.
  • Lead the technical co-ordination and Business Knowledge transition activities to offshore team.
  • Reviews and evaluates designs and project activities for compliance with architecture, security and quality guidelines and standards
  • Writes and executes complete testing plans, protocols, and documentation for assigned portion of data system or component; identifies defects and creates solutions for issues with code and integration into data system architecture.
  • Collaborates and communicates with project team regarding project progress and issue resolution.
  • Works with the data engineering team for all phases of larger and more-complex development projects and engages with external users on business and technical requirements.
  • Collaborates with peers, engineers, data scientists and project team.
  • Typically interacts with high-level Individual Contributors, Managers and Program Teams on a daily/weekly basis.

Certifications Needed:

  • Bachelor's or Master's degree in Computer Science, Information Systems, Engineering or equivalent.
  • 6+ years of relevant experience with detailed knowledge of data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools.
  • 3+ years of experience with Cloud based DW such as Redshift, Snowflake etc.
  • 3+ years experience in Big Data Distributed ecosystems (Hadoop, SPARK, Unity Catalog & Delta Lake)
  • 3+ years experience in Workflow orchestration tools such as Airflow etc.
  • 3+ years experience in Big Data Distributed systems such as Databricks, AWS EMR, AWS Glue etc.
  • Leverage monitoring tools/frameworks, like Splunk, Grafana, CloudWatch etc.
  • Experience with container management frameworks such as Docker, Kubernetes, ECR etc.
  • 3+ year s working with multiple Big Data file formats (Parquet, Delta Lake)
  • Experience working on CI/CD processes such as Jenkins, Codeway etc. and source control tools such as GitHub, etc.
  • Strong experience in coding languages like Python, Scala & Java

Expertise level