Vacancy expired!
- Build big data pipelines and structures the large-scale banking systems
- Implement and manage large scale ETL jobs on Hadoop/Spark clusters in Cloudera, Hortonworks platforms
- Interface with internal teams and data consumer teams to understand the data needs
- Own data quality throughout all stages of acquisition and processing, including data collection, ETL/wrangling, and normalization
- 4+ years of experience working with large data sets using open-source technologies such as Spark, Hadoop, Kafka on one of the major big data stack Cloudera, Hortonworks, and any other cloud systems like EMR
- Strong SQL (, Hive, MySQL, etc) and No-SQL (HBase, etc.) skills, including writing complex queries and performance tuning
- Must have a good command of Python, Spark, and big data techniques (Hive/Pig, MapReduce, Hadoop streaming, Kafka)
- Excellent communication, relationship skills, and a strong team player. Preferred Qualifications
- Experience developing and productizing real-world large scale data pipelines
- Experience with Apache Airflow, Apache NiFi, Kafka, Apache Atlas, Schema Registry
- Experience with DevOps, like GitLab, GitHub, Ansible, and automation
- Expertise in Python scripting, Java is a big plus
- Bachelor's/ Master’s degree in Computer Science, Engineering, or a related field.