Data Science Engineer

03 Jun 2024

Vacancy expired!

The Data Science Engineer works on a team which develops massively scalable, low latency, real-time streaming infrastructure and data lake architecture to support both internal operations and data science research. This position works on developing a centralized analytics data infrastructure to ingest event-based data in real-time from both internal systems and services as well as public data sources. This role is a key enabler for data science operations and will facilitate the creation of machine learning feature stores and data cleansing. Primary Responsibilities

  • Creation of centralized data infrastructure and data cleansing processes
  • Creation of machine learning feature stores, and views to facilitate data science research
  • Provide highly available data infrastructure that supports both real-time streaming and batch processes for customer-facing machine learning APIs and microservices
  • Manage risk, security, compliance, governance, and SSDLC within the scope of data engineering
Required Skills/Experience
  • Experienced with building and scaling highly available consumer-facing applications
  • Experienced with architecting enterprise real-time event driven data lake architecture on distributed file systems such as Hadoop, cloud buckets in AWS, Azure, Google Cloud Platform, or equivalent
  • Experienced with creating streaming data-mesh infrastructure with Kafka
  • Experienced with creating efficient, scalable & reliable batch data lake ETL processes with Spark
  • Understands best practices regarding serialization/deserialization formats for real-time and batch data lake processing including Parquet and Avro
  • Experienced managing SQL tools for data lakes (e.g. Snowflake, Athena, Impala, Hive, or equivalents)
  • Experience in architecting, and scaling systems based on various types of data stores including RDBMs, KVS, and in-memory data stores, and has experience with at least one of each type (e.g. Postgres, Cassandra, Redis, or equivalents)
  • Experienced with software application development and associated best practices including dev, stage, prod environments, code reviews, unit tests, regression tests, e2e tests, library and artifact packaging, etc.
  • Understands software design patterns, and has strong proficiency in at least one modern language (Python, Scala, Java, etc.)
  • Experienced leveraging git version control, and CI-CD pipelines with continuous release cycles
  • Experienced working on an Agile / Scrum project management environment
  • Experienced with creating Docker containers and associated shell scripts
  • Experience with data orchestration (Airflow, Dagster), versioning (Dolt), or quality testing (Great Expectations) is a plus
  • Real estate industry experience is a plus

  • ID: #42323117
  • State: Maryland Bethesda 20810 Bethesda USA
  • City: Bethesda
  • Salary: BASED ON EXPERIENCE
  • Job type: Contract
  • Showed: 2022-06-03
  • Deadline: 2022-08-01
  • Category: Et cetera