Sr. Data Scientist - eCommerce Data Platform Enablement job vacancy

Vacancy expired!

Position: Sr. Data Scientist - eCommerce Data Platform Enablement

Location: San Bruno, CA

Duration: Contract to Hire/Full Time

Job Description:Responsibilities:

Work with Data Platform Enablement team
Responsible for Walmart’s data platform, data processing, data integrations and data solutions working with internal and external partners. The broader team is currently on a transformation path, and this role will be instrumental in enabling the broader team’s vision.
system administration, security compliance, and internal tech audits
Responsible for operational excellence initiatives which include efficient use of data platform resources, identifying optimization opportunities, forecasting capacity, etc.
Design and implement different flavors of architecture to deliver better system performance and resiliency.
Identify opportunities to build automated processes and tools to improve efficiency.
Develop capability requirements and transition plan for the next generation of data enablement technology, tools, and processes to enable Walmart to efficiently improve performance with scale.
Drive best practices and standards around the usage of data platforms and tools
Implement data governance practices. Handle business and technology issues related to management of enterprise information assets and approaches related to data protection.

Skills:

Administering Dataproc and Airflow. Ability to create, maintain, scale, and debug production ephemeral and long-run Dataproc clusters as a Dataproc administrator
Deep understanding of data center architectures, networking, storage solutions, and scale system performance
Technical knowledge of big data analytics, optimization techniques, and data pipeline acceleration. Experience deploying and maintaining large-scale data pipeline in production. Experience deploying data science models and reporting solutions at scale, preferably with building Data tools from the ground up
Understanding of Cloud platforms such as Google Cloud Platform (preferred) and Azure and the difference between IaaS, CaaS, PaaS, etc.
Strong experience with Apache ecosystem especially Spark, Hadoop, Hive, Kafka, Tez, Airflow and different data formats such as parquet, orc, avro, etc.
Familiar with DevOps best practices and cloud native technologies
Programming experience in SQL, Python (preferred), R, Scala, Java, or Bash
Experience with BigQuery, Presto, CloudSQL, MSSQL, Cassandra, and Mongo DB is a plus
Experience with PySpark, SparkSQL, MLlib, and Spark Rapids on GPUs is a plus
Experience setting up logging and monitoring tools, and helping to debug complex data pipelines

Education & Experience:

5+ years of relevant experience in roles with responsibility over data platforms and data operations dealing with large volumes of data in cloud based distributed computing environments.
Graduate degree preferred in a quantitative discipline (e.g., engineering, economics, math, operations research).
Proven ability to solve enterprise level data operations problems at scale which require cross-functional collaboration for solution development, implementation, and adoption.