ETL/DQ Developer( with experience - Apache Spark & Java)

12 Oct 2024

Vacancy expired!

Responsibilities:

  • Hands-on architecture/development of ETL pipelines using our internal framework written in Apache Spark & Java
  • Hands-on development in consuming Kafka/REST APIs or other streaming sources using Spark and persisting data in Graph or any NoSQL databases.
  • Implement DQ metrics and controls for data in a big data environment
  • Interpret data, analyze results using statistical techniques and provide ongoing reports
  • Develop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality
  • Acquire data from primary or secondary data sources and maintain databases/data systems
  • Identify, analyze, and interpret trends or patterns in complex data sets
  • Filter and clean data by reviewing reports and performance indicators to locate and correct problems
  • Work with management to prioritize business and information needs
  • Locate and define new process improvement opportunities. Provide architectural, best practice ideas and suggestions to better current setup.
Technical Skill Set
  • Spark/Java Hands on development experience : Kafka, Spark Streaming is must
  • Hands-on Development experience in Hadoop ecosystem tools - Hive, Parquet, Sqoop, Presto, DistCp is must
  • Development experience in Big Data on Cloud - Specifically in AWS - S3, Glue
    • AWS certification is preferable: AWS Developer/Architect/DevOps/Big Data
    Additional Requirements
    • Technical expertise regarding data models, database design development, data mining and segmentation techniques
    • Good experience writing complex SQL and ETL processes
    • Excellent coding and design skills, particularly in Java/Scala and Python and or Java.
    • Experience working with large data volumes, including processing, transforming and transporting large-scale data
    • Excellent working knowledge of Apache Hadoop, Apache Spark, Kafka, Scala, Python etc.
    • Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
    • Good understanding & usage of algorithms and data structures
    • Good Experience building reusable frameworks.
    • Experience working in an Agile Team environment.
    • Excellent communication skills both verbal and written
    Qualifications
    • At least 8+ years of experience architecting and implementing complex ETL pipelines preferably with Spark toolset.
    • At least 4+ years of experience with Java particularly within the data space

    • ID: #46401411
    • State: Pennsylvania Philadelphia 19019 Philadelphia USA
    • City: Philadelphia
    • Salary: Depends on Experience
    • Job type: Contract
    • Showed: 2022-10-12
    • Deadline: 2022-11-28
    • Category: Et cetera