We are looking for a Senior Data Engineer with extensive experience in the big data ecosystem to work in our Data Pipeline team. As part of the Epsilon DMS Data Organization, the Data Pipeline team’s responsibilities include but is not limited to maintaining streaming and batched data pipelines collecting hundreds of billions of data rows daily, near real time aggregation of a variety of data sets (using Spark Structured Streaming) and maintaining system of record raw and aggregate data sets (HDFS and HIVE). Additionally, as part of the Data Pipeline team work closely with Real Time Bidding (RTB), data warehousing, ETL and decision science teams in building, maintaining and optimizing solutions.
The candidate must be proficient in Scala or Java, with the ability to be a key contributor in building and maintaining Spark jobs and AirFlow DAGs. This role will be hands-on in code development and data architecture to optimize the platform for growth and maximum efficiency. The person in this role will need to be able to work both independently and as part of a team to meet required specifications of solution delivery.
Responsibilities
- You will design and code data pipelines connecting Real Time Bidding platform, HDFS, Elastic, MPP and Postgres databases utilizing Flume, Kafka, Spark, Cassandra/Scylla and other technologies as needed.
- Maintain and extend Spark framework used by DMS Data Organization to support various aspects of the business.
- Build and maintain AirFlow DAGs for job management.
- Work closely with infrastructure teams in capacity planning, hardware procurement and build outs.
- Build and maintain metrics collection to help with identifying production issues, optimizing job performance and alerting on error conditions (PagerDuty).
- Develop test cases to demonstrate new code meets functional requirements.
- Ideal candidates can lead in one or more areas of: design, code development, data modeling, cross team communication, application maintenance and identifying opportunities for improving code quality or performance through refactoring and/or incorporating new technologies.
Required qualifications
- Bachelor’s Degree in Computer Science or equivalent degree is required.
- 4+ years of experience developing in Java and/or Scala
- Spark experience a big plus
- Strong experience in SQL, bash and Python
- Experience with Hadoop Stack (HDFS, Hive, YARN, HBase)
- Experience with Kafka and the Kafka producer and consumer APIs (Kafka Connect, K-streams and ksql a plus)
- Experience with Postgres and Cassandra/Scylla a big plus
- Docker and Kubernetes experience a big plus
- ELK stack experience a plus
- Experience with scheduling applications with complex interdependencies
- Good experience in working with geographically and culturally diverse teams
- Excellent written and verbal communication skills
- Excellent analytical and problem-solving skills
- Ability to diagnose and troubleshoot problems quickly
Epsilon is the leader in outcome-based marketing. We enable marketing that’s built on
proof, not promises.TM Through Epsilon PeopleCloud, the marketing platform for personalizing consumer journeys with performance transparency, Epsilon helps marketers anticipate, activate, and prove measurable business outcomes.
Powered by CORE ID,® the most accurate and stable identity management platform representing 200+ million people, Epsilon’s award-winning data and technology rooted in privacy by design and underpinned by powerful AI. With more than 50 years of experience in personalization and performance working with the world’s top brands, agencies, and publishers, Epsilon is a trusted partner leading CRM, digital media, loyalty, and email programs. Positioned at the core of Publicis Groupe, Epsilon is a global company with over 8,000 employees in over 40 offices around the world. For more information, visit epsilon.com.
Follow us on Twitter at @EpsilonMktg.
We see a world where modern marketing is built on truth, trust and transparency,
not smoke and mirrors. We want to be part of a world where consumers are
recognized and respected, privacy is protected and integrity is expected.
We enable marketing built on proof, not promises. We create robust customer
experiences that drive performance at the individual level, and help brands make
smarter decisions that drive real business outcomes.