Vacancy expired!
- Develop end to end data pipeline using Spark, Hive and Impala
- Write SPARK jobs to fetch large data volumes from source
- Understand business needs, analyze functional specifications and map those to development and designing Apache Spark programs and algorithms.
- Optimizing Spark code, Impala queries and Hive partitioning strategy for better scalability, reliability and performance.
- Work on leading BI technologies like MSTR, Tableau over Hadoop Ecosystem through ODBC/JDBC connection
- Work on hive performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Build Machine Learning Algorithms using Spark.
- Design and deploy enterprise-wide scalable operations
- Data wrangling and creating workable datasets and work on different file formats like Parquet, ORC, Sequence files and different serialization formats like Avro
- Build applications using Maven, SBT and integrated with continuous integration servers like Jenkins to build jobs.
- Execution of Hadoop ecosystem and Applications through Apache HUE
- Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against complexity and time lines.
- Performance tuning of Impala queries
- Design and documented operational problems by following standards and procedures using software reporting tool JIRA
- Installing, configuring, and using Hadoop components like Spark, Spark Job server, Spark Thrift server, Phoenix on HBase, Flume, Sqoop
- Expertise in Shell-Scripts, Cron Automation and Regular Expressions
- Coordinating for the Development, Integration and Production deployments.
- Use Rest services to access HBASE data and used the data for further processing in the downstream systems
- Good experience in debugging issues using the Hadoop, Spark Log files
- Responsible for preparing technical specifications, analyzing functional specs, development and maintenance of code
- Create mapping documents to outline data flow from source to target.
- Perform migration from Legacy Databases RDBMS to Hadoop Ecosystem
- Use Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster
- Create various database objects like tables, views, functions, and triggers using SQL
- Experience with Spark and Spark SQL
- Must have hands on experience in Java, Spark, Scala, AKKA,Hive, Maven/SBT, Amazon S3
- Experience in Kafka, ReST services is a plus.
- Experience in Hadoop, HBase, MongoDB, or other NoSQL platforms
- Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
- Knowledge in Sqoop, Flume preferred
- Excellent communication skills with both Technical and Business audience
- Experience in Apache Phoenix, Text Search (Solr, ElasticSearch, CloudSearch)
- 3+ years strong native SQL skills
- 3+ years strong experience in database and data warehousing concepts and techniques. Must understand: relational and dimensional modeling, star/snowflake schema design, BI, Data Warehouse operating environments and related technologies, ETL, MDM, and data governance practices.
- 3+ years experience working in Linux
- 3+ years experience with Spark
- 3+ years experience with Scala
- 1+ years experience with Hadoop, Hive, Impala, HBase, and related technologies
- 1+ years strong experience with low latency (near real time) systems and working with Tb data sets, loading and processing billions of records per day
- 1+ years experience with MapReduce/YARN
- 1+ years experience with Lambda architectures
- 1+ years experience with MPP, shared nothing database systems, and NoSQL systems
- Ability to work in a fast-paced, team-oriented environment
- Ability to complete the full lifecycle of software development and deliver on time
- Ability to work with end-users to gather requirements and convert them to working documents
- Strong interpersonal skills, including a positive, solution-oriented attitude
- Must be passionate, flexible and innovative in utilizing the tools, their experience, and any other resources, to effectively deliver to very challenging and always changing business requirements with continuous success
- Must be able to interface with various solution/business areas to understand the requirements and prepare documentation to support development
- Healthcare and/or reference data experience is a plus
- A willingness and ability to travel
- Right to work in the recruiting country.
- ID: #49497996
- State: Pennsylvania Plymouthmeeting 19462 Plymouthmeeting USA
- City: Plymouthmeeting
- Salary: Competitive
- Job type: Contract
- Showed: 2023-03-19
- Deadline: 2023-05-17
- Category: Et cetera