ETL DQ Developer( with experience - Apache Spark Java) job vacancy

Vacancy expired!

Responsibilities:

Hands-on architecture/development of ETL pipelines using our internal framework written in Apache Spark & Java
Hands-on development in consuming Kafka/REST APIs or other streaming sources using Spark and persisting data in Graph or any NoSQL databases.
Implement DQ metrics and controls for data in a big data environment
Interpret data, analyze results using statistical techniques and provide ongoing reports
Develop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality
Acquire data from primary or secondary data sources and maintain databases/data systems
Identify, analyze, and interpret trends or patterns in complex data sets
Filter and clean data by reviewing reports and performance indicators to locate and correct problems
Work with management to prioritize business and information needs
Locate and define new process improvement opportunities. Provide architectural, best practice ideas and suggestions to better current setup.

Technical Skill Set

Spark/Java Hands on development experience : Kafka, Spark Streaming is must

Hands-on Development experience in Hadoop ecosystem tools - Hive, Parquet, Sqoop, Presto, DistCp is must

Development experience in Big Data on Cloud - Specifically in AWS - S3, Glue

Additional Requirements

Technical expertise regarding data models, database design development, data mining and segmentation techniques
Good experience writing complex SQL and ETL processes
Excellent coding and design skills, particularly in Java/Scala and Python and or Java.
Experience working with large data volumes, including processing, transforming and transporting large-scale data
Excellent working knowledge of Apache Hadoop, Apache Spark, Kafka, Scala, Python etc.
Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
Good understanding & usage of algorithms and data structures
Good Experience building reusable frameworks.
Experience working in an Agile Team environment.
Excellent communication skills both verbal and written

Qualifications

At least 8+ years of experience architecting and implementing complex ETL pipelines preferably with Spark toolset.
At least 4+ years of experience with Java particularly within the data space

ETL/DQ Developer( with experience - Apache Spark & Java)