Big Data Engineer

03 Dec 2024

Vacancy expired!

Big Data/Python Software EngineerNew York, NY (Remote to start then onsite)6 Months Contract to Hire Big Data/Python Software Engineer candidate

with expertise in design, development and implementation of statistical modeling databases and the implementation of statistical models in Python/R on a DataIku/Hadoop platform. The position requires working closely with the WM Strategists and Modeling Group, who are responsible for the development and implementation of statistical based models covering a wide range of financial products such as bank deposits, mortgage lending and retail lending. We have an opening for a qualified individual to join our fast-paced work environment.

The right candidate would have a background in data engineering and requisite familiarity and experience working with statistical modeling. The candidate should be well versed in the Hadoop ecosystem and all the intricate details of Hadoop application design. Additionally, experience with Hadoop/Spark performance tunings which includes but not limited to data partitioning and indexing is requisite.

Responsibilities include:
  • Work closely with members of WM Strats and Modeling team in the design, development and implementation of large statistical databases in DataIku/Hadoop environment
  • Work closely with members of WM Strats and Modeling team in the implementation of statistical and econometric models in Python/PySpark/R on the DataIku platform
  • Work closely with members of WM Strats and Modeling team to facilitate processing large data in Hadoop environment using Spark/PySpark/RSpark
  • Ensure data integrity through – data quality, validation, governance and transparency
  • Production deployment and model monitoring to ensure stable performance and adherence to standards

Skills required:
  • Experienced professional with

    10-12 years of experience developing and implementing statistical models in Big Data ecosystem, i.e., Hadoop, Spark, HBase, Hive / Impala or any other similar distributed computing technology
  • Proficiency with Python/R and basic libraries for statistical/econometric modeling such as scikit-learn, pandas
  • Experienced in Hadoop, Spark, HDFS, Python, R, PySpark and other leading technologies
  • Proficiency with DataIku or similar tools
  • Proficiency in data analysis using complex and optimized SQL and / or above-mentioned technologies
  • Understanding of data architecture, structures, data modeling and database design and performance management
  • Good written and verbal communication skills

Proficiency / Experience with the following a plus:
  • In-depth understanding of Statistics
  • Finance, Mortgages, Bank Deposit Products

  • ID: #23794274
  • State: New York New york city 10001 New york city USA
  • City: New york city
  • Salary: Depends on Experience
  • Job type: Contract
  • Showed: 2021-12-03
  • Deadline: 2022-01-28
  • Category: Et cetera