Vacancy expired!
- 6+ years of overall IT experience
- 3+ years of experience with high-velocity high-volume stream processing: Apache Kafka and Spark Streaming
- Experience with real-time data processing and streaming techniques using Spark structured streaming and Kafka
- Deep knowledge of troubleshooting and tuning Spark applications
- 3+ years of experience with data ingestion from Message Queues (Tibco, IBM, etc.) and different file formats across different platforms like JSON, XML, CSV
- 3+ years of experience with Big Data tools/technologies like Hadoop, Spark, Spark SQL, Kafka, Sqoop, Hive, S3, HDFS, or Cloud platforms e.g. AWS, Google Cloud Platform, etc.
- 3+ years of experience building, testing, and optimizing ‘Big Data’ data ingestion pipelines, architectures and data sets
- 2+ years of experience with Python (and/or Scala) and PySpark
- 2+ years of experience with NoSQL databases, including HBASE and/or Cassandra
- Knowledge of Unix/Linux platform and shell scripting is a must
- Strong analytical and problem-solving skills
- Experience with Cloudera/Hortonworks HDP and HDF platforms
- Experience with NIFI, Schema Registry, NIFI Registry
- Strong SQL skills with ability to write intermediate complexity queries
- Strong understanding of Relational & Dimensional modeling
- Experience with GIT code versioning software
- Experience with REST API and Web Services
- Good business analyst and requirements gathering/writing skills