Vacancy expired!
- Implementing ETL process using defined framework
- Monitoring performance and advising any necessary infrastructure changes
- Create/modify tables, views in hive
- Write Shell scripts to execute hive on spark jobs
- Automate the shell scripts on job scheduling tool - Autosys
- Improve job performance by implementing hive parameters, spark configuration level changes and spark optimization techniques
- Create/modify hql script to retrieve data from hive tables or to use hql script for data processing
- Working with team to defining data retention logic as per business requirements
- Perform and oversee tasks such as writing scripts, writing T-SQL queries and calling APIs
- Customize and oversee integration tools, warehouses, databases, and analytical systems
- Design the data flow, create data flow diagrams and implement design level changes
- Design and implement data stores that support the scalable processing and storage of our high-frequency data
- With the help of Admin/support team solve any ongoing issues with operating the cluster
- Bachelor’s or master’s degree in computer/data science technical or related experience
- 7+ years of hands-on years of relevant data engineering experience with data warehouse, data lake, and enterprise bigdata platforms required
- Experience working in an agile/iterative methodology required
- Working experience with Bigdata-Hadoop ecosystem:, NoSQL-Hive, Impala, Spark, Scala, shell scripting and RDMBS-MS SQL server required
- Experience with integration of data from multiple data sources with full load, incremental load and real time load
- Working experience with development/deployment tool: Jira, Bitbucket, Jenkins, RLM
- Experience with Spark, Hadoop v2, MapReduce, HDFS required
- Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala required
- At least 2 years of relevant experience with real-time data stream platforms such as Flume, Kafka and Spark Streaming
- Experience with various ETL techniques and frameworks required
- Excellent analytical, problem-solving skills and have excellent communication skill
- Ability to solve any ongoing issues with operating the cluster