Analytics Engineer Python & PySpark

Schwarz Unternehmenskommunikation GmbH & Co. KG
Berlin, Germany
2 days ago

Role details

Contract type
Permanent contract
Employment type
Part-time / full-time
Working hours
Regular working hours
Languages
English

Job location

Berlin, Germany

Tech stack

Query Performance
Java
API
Artificial Intelligence
Data analysis
Cloud Computing
Code Review
Computer Security
ETL
Digital Technology
Dimensional Modeling
E-Business
Python
Standard Sql
SQL Databases
Data Streaming
GIT
Data Layers
PySpark
Git Flow
Integration Tests
Optimization Algorithms
Data Pipelines
Databricks

Job description

Schwarz Digits creates the technological foundation for digital sovereignty in Europe. As the IT and digital division of the Schwarz Group, we develop and manage the IT infrastructures for the retail divisions Lidl and Kaufland, as well as Schwarz Production and PreZero. At the same time, we operate as an independent provider in the external market to support companies across Europe in their digital transformation. We bundle our core services in the areas of Cloud, Cyber Security, Data & AI, Communication, and Workspace.

Join us and contribute to digital sovereignty in Europe. With us, you will work at the intersection of agility and security: You will benefit from fast decision-making processes, enjoy genuine creative freedom in your projects, and be able to build upon the stable foundation of the Schwarz Group.

As a digital system provider for Lidl's online business, we at Schwarz Digits create efficient, scalable digital products and services and develop them for the future. Our divisions are divided into Commerce Platforms, Data Tech & Enablement, New Digital Business Models and SCRM Loyalty Platforms.

Play your part in our team´s success. You will join the Article Intelligence and Finance team who is responsible for building data products and running them. We are committed to deliver high quality data products to our stakeholders and build collaboration across all teams that we work with.

Your tasks

  • scalable production-ready APIs using Python and Java (optional).
  • ETL and CI/CD pipelines
  • data product templates
  • monitoring dashboards for usage of our data products

Requirements

Do you have experience in SQL?, * A passion for diving into complex datasets to extract insights

  • A proactive approach to optimizing data flows and improving overall data quality.
  • Ability to bridge the gap between technical infrastructure and business needs, translating "messy" requirements into clean data products.
  • Expert-level knowledge of SQL, including complex window functions, CTEs, and query performance tuning.
  • Deep understanding of dimensional modeling and practical experience implementing SCD Type 1, Type 2, and Type 3 logic.
  • Practical experience with PySpark, specifically focusing on optimization techniques such as broadcasting, salting, and partitioning.
  • Proven experience designing and implementing data layers (Bronze, Silver, Gold) within a Medallion Architecture.
  • Experience writing and maintaining unit and integration tests for data pipelines to ensure reliability.
  • Mastery of Git, including branching strategies, pull request workflows, and collaborative code reviews.
  • Experience using CI/CD pipelines to automate data deployments.
  • Experience with Databricks.

Apply for this position