Our team is at the forefront of applying Machine Learning (ML) to interpret and process complex chemical data, specifically chromatograms and environmental testing results. We're seeking a skilled Data Engineer to design, build, and maintain the robust data pipelines and infrastructure necessary to support our ML model training, deployment, and analysis workflows. The ideal candidate has a strong foundation in data engineering, understands the unique challenges of scientific data, and is eager to work closely with both Machine Learning Engineers (MLE) and Chemists.Key ResponsibilitiesData Pipeline Development: Design, construct, and manage scalable and reliable ETL/ELT pipelines to ingest, clean, transform, and store raw chemistry data (e.g., CSV, JSON, and proprietary instrument formats).Data Modeling & Warehousing: Develop optimized data models and manage a data warehouse (or data lake) to support fast querying and ML feature engineering on complex datasets, including time-series and spectral data from chromatograms.ML Infrastructure: Collaborate with MLE to containerize and deploy ML models and build automated model retraining and monitoring pipelines.Data Quality & Governance: Implement robust data quality checks, validation, and monitoring to ensure the integrity and reproducibility of chemical experiment data used for ML.Tooling: Develop internal tools and APIs to facilitate data access for MLE and provide standardized interfaces for data submission from chemistry lab systems.