Senior Data Engineer

Python/Hadoop/Spark – sought by leading investment bank based in London – Hybrid – contract*inside IR35 - umbrella*

Design and implement scalable data pipelines that extract, transform and load data from various sources into the data Lakehouse.
Help teams push the boundaries of analytical insights, creating new product features using data.
Develop and automate large scale, high performance data processing systems (batch and real time) to drive growth and improve product experience.
Develop and maintain infrastructure tooling for our data systems.
Collaborate with software teams and business analysts to understand their data requirements and deliver quality fit for purpose data solutions.
Ensure data quality and accuracy by implementing data quality checks, data contracts and data governance processes.
Contribute to the ongoing development of our data architecture and data governance capabilities.
Develop and maintain data models and data dictionaries.

Significant Experience with data modelling, ETL processes, and data warehousing.
Significant exposure and hands-on in at least 2 of the programming languages - Python, Java, Scala, GoLang.
Significant experience with Hadoop, Spark and other distributed processing platforms and frameworks.
Experience working with Open table/storage formats like delta lake, apache iceberg or apache hudi.
Experience of developing and managing real-time data streaming pipelines using Change data capture (CDC), Kafka and Apache Spark.
Experience with SQL and database management systems such as Oracle, MySQL or PostgreSQL.
Strong understanding of data governance, data quality, data contracts, and data security best practices.
Exposure to data governance, catalogue, lineage and associated tools.
Experience in setting up SLAs and contracts with the interfacing teams.
Experience working with and configuring data visualization tools such as Tableau.
Ability to work independently and as part of a team in a fast-paced environment.
Experience working in a DevOps culture and willing to drive it. You are comfortable working with CI/CD tools (ideally IBM UrbanCode Deploy, TeamCity or Jenkins), monitoring tools and log aggregation tools. Ideally, you would have worked with VMs and/or Docker and orchestration systems like Kubernetes/OpenShift.

Please apply within for further details – Matt Holmes – Harvey Nash
Harvey Nash

Senior Data Engineer - Python, Hadoop, Spark