Vacancy expired!
Hi , Have a look over the JD and let me know what you think .
Title :ETL QA Location : Tempe, AZ/ San Ramon, CADuration :12+ Months Interview :WebEx Description :- Experience in writing complex SQL, python/shell Scripts to test data ingestion framework based on the data mapping & requirements provided and perform extensive data analysis to identify the defects.
- Strong Data Analytics, ETL, Data warehouse, Data Virtualization, BI Dashboard concepts.
- Experience in working with large scale Big data/Enterprise Data Warehouse, Data Integration, Data Migration and upgrade projects.
- Experience in testing complex data systems, data ingestion pipeline through batch, real time/streaming framework.
- Experience in building/updating automating frameworks using programming languages such as Python/Java/Shell or previous proven programming experience in any relevant scripting languages.
- Experience in test data setup in various file formats and databases.
- Testing data ingestion pipeline through batch, real time/streaming framework implemented using Spark or NIFI.
- Testing different types dimension and FACT tables with in-depth data warehousing knowledge.
- UNIX environment by writing HDFS and Shell commands for job execution, file validation, etc.,
- Programming language (Python or Shell scripts or Scala) to understand data ingestion functionalities implemented using Spark and python scripts, analyze the log for failures.
- Hive – understand the mapping/requirement document and write medium to complex level HiveQL for data validation between different tables, DDL & DML operations
- Different file formats – validating data in different file formats (Json, xml, parquet, delimited, fixed width) with another file or Hive/HBase table using SparkSQL or python/shell scripts.
- Test data setup in different file formats for positive and negative scenarios.
- Integration testing of E2E data ingestion pipeline integrating different tools.
- YARN – to monitor the spark jobs running in cluster mode and check the logs for any issues.(Good to have)
- Developing automation script to validate data between table and files.
- Airflow or any other scheduling tools to execute the E2E jobs and monitor data ingestion process, check the logs for any issues.
- ID: #23701519
- State: California Sanramon 94582 Sanramon USA
- City: Sanramon
- Salary: Depends on Experience
- Job type: Contract
- Showed: 2021-12-01
- Deadline: 2022-01-23
- Category: Software/QA/DBA/etc