Etl Qa

01 Dec 2024

Vacancy expired!

Hi , Have a look over the JD and let me know what you think .

Title :ETL QA

Location : Tempe, AZ/ San Ramon, CA

Duration :12+ Months

Interview :WebEx

Description :
  • Experience in writing complex SQL, python/shell Scripts to test data ingestion framework based on the data mapping & requirements provided and perform extensive data analysis to identify the defects.
  • Strong Data Analytics, ETL, Data warehouse, Data Virtualization, BI Dashboard concepts.
  • Experience in working with large scale Big data/Enterprise Data Warehouse, Data Integration, Data Migration and upgrade projects.
  • Experience in testing complex data systems, data ingestion pipeline through batch, real time/streaming framework.
  • Experience in building/updating automating frameworks using programming languages such as Python/Java/Shell or previous proven programming experience in any relevant scripting languages.
  • Experience in test data setup in various file formats and databases.
(mandatory)
  • Testing data ingestion pipeline through batch, real time/streaming framework implemented using Spark or NIFI.
  • Testing different types dimension and FACT tables with in-depth data warehousing knowledge.
  • UNIX environment by writing HDFS and Shell commands for job execution, file validation, etc.,
  • Programming language (Python or Shell scripts or Scala) to understand data ingestion functionalities implemented using Spark and python scripts, analyze the log for failures.
  • Hive – understand the mapping/requirement document and write medium to complex level HiveQL for data validation between different tables, DDL & DML operations
  • Different file formats – validating data in different file formats (Json, xml, parquet, delimited, fixed width) with another file or Hive/HBase table using SparkSQL or python/shell scripts.
  • Test data setup in different file formats for positive and negative scenarios.
  • Integration testing of E2E data ingestion pipeline integrating different tools.
  • YARN – to monitor the spark jobs running in cluster mode and check the logs for any issues.(Good to have)
  • Developing automation script to validate data between table and files.
  • Airflow or any other scheduling tools to execute the E2E jobs and monitor data ingestion process, check the logs for any issues.
HBase – shell commands for data validations

  • ID: #23701519
  • State: California Sanramon 94582 Sanramon USA
  • City: Sanramon
  • Salary: Depends on Experience
  • Job type: Contract
  • Showed: 2021-12-01
  • Deadline: 2022-01-23
  • Category: Software/QA/DBA/etc