Etl Qa job vacancy in Sanramon, California Nityo Infotech Corporation

Vacancy expired!

Hi , Have a look over the JD and let me know what you think .

Title :ETL QA

Location : Tempe, AZ/ San Ramon, CA

Duration :12+ Months

Interview :WebEx

Description :

Experience in writing complex SQL, python/shell Scripts to test data ingestion framework based on the data mapping & requirements provided and perform extensive data analysis to identify the defects.
Strong Data Analytics, ETL, Data warehouse, Data Virtualization, BI Dashboard concepts.
Experience in working with large scale Big data/Enterprise Data Warehouse, Data Integration, Data Migration and upgrade projects.
Experience in testing complex data systems, data ingestion pipeline through batch, real time/streaming framework.
Experience in building/updating automating frameworks using programming languages such as Python/Java/Shell or previous proven programming experience in any relevant scripting languages.
Experience in test data setup in various file formats and databases.

(mandatory)

Testing data ingestion pipeline through batch, real time/streaming framework implemented using Spark or NIFI.
Testing different types dimension and FACT tables with in-depth data warehousing knowledge.
UNIX environment by writing HDFS and Shell commands for job execution, file validation, etc.,
Programming language (Python or Shell scripts or Scala) to understand data ingestion functionalities implemented using Spark and python scripts, analyze the log for failures.
Hive – understand the mapping/requirement document and write medium to complex level HiveQL for data validation between different tables, DDL & DML operations
Different file formats – validating data in different file formats (Json, xml, parquet, delimited, fixed width) with another file or Hive/HBase table using SparkSQL or python/shell scripts.
Test data setup in different file formats for positive and negative scenarios.
Integration testing of E2E data ingestion pipeline integrating different tools.
YARN – to monitor the spark jobs running in cluster mode and check the logs for any issues.(Good to have)
Developing automation script to validate data between table and files.
Airflow or any other scheduling tools to execute the E2E jobs and monitor data ingestion process, check the logs for any issues.

HBase – shell commands for data validations