Sr. Data Scientist - eCommerce Data Platform Enablement

11 Feb 2025

Vacancy expired!

Position: Sr. Data Scientist - eCommerce Data Platform Enablement

Location: San Bruno, CA

Duration: Contract to Hire/Full Time

Job Description:Responsibilities:
  • Work with Data Platform Enablement team
  • Responsible for Walmart’s data platform, data processing, data integrations and data solutions working with internal and external partners. The broader team is currently on a transformation path, and this role will be instrumental in enabling the broader team’s vision.
  • system administration, security compliance, and internal tech audits
  • Responsible for operational excellence initiatives which include efficient use of data platform resources, identifying optimization opportunities, forecasting capacity, etc.
  • Design and implement different flavors of architecture to deliver better system performance and resiliency.
  • Identify opportunities to build automated processes and tools to improve efficiency.
  • Develop capability requirements and transition plan for the next generation of data enablement technology, tools, and processes to enable Walmart to efficiently improve performance with scale.
  • Drive best practices and standards around the usage of data platforms and tools
  • Implement data governance practices. Handle business and technology issues related to management of enterprise information assets and approaches related to data protection.

Skills:
  • Administering Dataproc and Airflow. Ability to create, maintain, scale, and debug production ephemeral and long-run Dataproc clusters as a Dataproc administrator
  • Deep understanding of data center architectures, networking, storage solutions, and scale system performance
  • Technical knowledge of big data analytics, optimization techniques, and data pipeline acceleration. Experience deploying and maintaining large-scale data pipeline in production. Experience deploying data science models and reporting solutions at scale, preferably with building Data tools from the ground up
  • Understanding of Cloud platforms such as Google Cloud Platform (preferred) and Azure and the difference between IaaS, CaaS, PaaS, etc.
  • Strong experience with Apache ecosystem especially Spark, Hadoop, Hive, Kafka, Tez, Airflow and different data formats such as parquet, orc, avro, etc.
  • Familiar with DevOps best practices and cloud native technologies
  • Programming experience in SQL, Python (preferred), R, Scala, Java, or Bash
  • Experience with BigQuery, Presto, CloudSQL, MSSQL, Cassandra, and Mongo DB is a plus
  • Experience with PySpark, SparkSQL, MLlib, and Spark Rapids on GPUs is a plus
  • Experience setting up logging and monitoring tools, and helping to debug complex data pipelines

Education & Experience:
  • 5+ years of relevant experience in roles with responsibility over data platforms and data operations dealing with large volumes of data in cloud based distributed computing environments.
  • Graduate degree preferred in a quantitative discipline (e.g., engineering, economics, math, operations research).
  • Proven ability to solve enterprise level data operations problems at scale which require cross-functional collaboration for solution development, implementation, and adoption.

  • ID: #49136808
  • State: California Sanbruno 94066 Sanbruno USA
  • City: Sanbruno
  • Salary: Depends on Experience
  • Job type: Permanent
  • Showed: 2023-02-11
  • Deadline: 2023-03-24
  • Category: Et cetera