Vacancy expired!
- 5+ years of experience in processing large volumes and variety of data (Structured and semi-structured data, writing code for parallel processing, shredding XMLS, JSONs and reading PDFs) – Mandatory.
- 3+ years of development experience in Python for data processing and analysis – Mandatory.
- 3+ years of experience – using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work – Mandatory
- Strong SQL experience is a must – Mandatory
- Detail oriented. Excellent communication skills (verbal and written).
- Must be able to manage multiple priorities and meet deadlines
- 2+ years of experience with Snowflake, preferably parsing JSON and XML files using Snow SQL or Snowpark
- 2+ years of programming experience in PySpark for data processing and analysis – Optional.
- Programming Languages experience in Python, PySpark and SQL for data ingestion.
- Present ideas and recommendations on data handling and data parsing technologies to management
- Cleanse, manipulate and analyze large datasets (Semi-Structured and Unstructured data – XMLs, JSONs, CSVs, PDFs) using python and Snowflake database.
- Develop Python scripts to filter/cleanse/map/aggregate data.
- Manage and implement data processes (Data Quality reports).
- Develop data profiling, deduping logic, matching logic for analysis.
- Python development experience is a must have, SQL, PySpark, Hadoop , XMLS and JSONs
- Bachelor’s degree in Computer Science or Engineering or equivalent working experience.