Vacancy expired!
- Cleanse, manipulate and analyze large datasets (Semi-Structured and Unstructured data – XMLs, JSONs, CSVs, PDFs) using Python and Snowflake database.
- Develop Python scripts to filter/cleanse/map/aggregate data.
- Manage and implement data processes (Data Quality reports).
- Develop data profiling, deduping logic, matching logic for analysis.
- Programming Languages experience in Python, PySpark and SQL for data ingestion.
- Present ideas and recommendations on data handling and data parsing technologies to management.
- 5+ years of experience in processing large volumes and variety of data (Structured and semi-structured data, writing code for parallel processing, shredding XMLS, JSONs and reading PDFs) – Mandatory.
- 3+ years of development experience in Python for data processing and analysis – Mandatory.
- 3+ years of experience using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work – Mandatory.
- Strong SQL experience is a must – Mandatory.
- Detail oriented. Excellent communication skills (verbal and written).
- Must be able to manage multiple priorities and meet deadlines.
- 2+ years of experience with Snowflake, preferably parsing JSON and XML files using Snow SQL or Snowpark.
- 2+ years of programming experience in PySpark for data processing and analysis.
- Degree in Computer Science, Statistics, Mathematics, or related field.