Vacancy expired!
We have been retained by our client in Houston, Texas to deliver a
Data Engineer on a long term contract basis. This team is experiencing growth and this big data practice is evolving/improving quickly. We are seeking candidates with Python data framework creation experience.Low turnover at this company. Nice culture. This role is part of a multi-faceted, multi-disciplined 7-person team, as made part of a much larger big data practice / department / team. Your contributions, accomplishments are tied to specific, and measurable goals and results. The problems are real world problems, and the results are real world results. We seek a data engineer to create python data frameworks.The candidates we seek will have 3+ years of focused work experience programming data frameworks or full stack applications with Python and will have experience with some of these: Parquet, Delta Lake, Apache Iceberg, data lake, data lakehouse, Dremio Python Pandas, Numpy, Pytest, Scikit-Learn, Tensorflow, Keras, Matplotlib, SpaCy, NLTK, Theano, Pytorch, Caffe, Caffe2 Apache Airflow, Kubernetes, Distributed File Systems, and Massively Parallel Processing (MPP) PySpark, Apache Spark big data analytics, machine learning (ML), Artificial Intelligence (AI) You will be treated as a first class citizen/data engineer and as a valuable data engineering team player, as you provide valuable analytical and technical work including, delivering data streams needed by data scientists; a huge amount of sensor data to work with on this big data practice.- Design and implement reliable python data pipelines to integrate disparate data sources into a single Data Lakehouse or data lake
- Design and implement data quality pipelines to ensure data correctness and building trusted data sets
- Design and implement a Data Lakehouse solution to accurately reflect business operations
- Assist with data platform performance tuning and physical data model design and support including partitioning and compaction or compression of data
- Provide guidance in data visualizations and reporting efforts to ensure solutions are aligned to business objectives
- 3+ years of experience as a Data Engineer designing and maintaining data pipeline architectures, no necessarily only Python, but other languages, albeit some Python is required, heavy Python preferred.
- A vast experience in SQL, any SQL, but any of ANSI SQL, PL/SQL, TSQL, Transact-SQL, stored procedures and Oracle, SQL Server
- Experience in various data integration patterns including ETL, ELT, Pub/Sub, publication/subscription and Change Data Capture
- Experience in data management practices including data catalog, data lineage, and master data management
- Experience in business analysis and defining business performance metrics
- Experience in software development practices such as Design Principles and Patterns, Testing, CI/CD, and version control
- Experience in implementing a data lake, or Data Lakehouse using Apache Iceberg or Delta Lake
- Knowledgeable of common data visualization tools such as Power BI and Tibco Spotfire
- Experience with Dremio is preferred