Vacancy expired!
We have been retained by our client in Houston, Texas to deliver a
Data Engineer on a long term contract basis. This team is experiencing growth and this big data practice is evolving/improving quickly. We are seeking data engineering candidates who are in search for Python data framework design and development opportunities.Low turnover at this company. Nice culture. This role is part of a multi-faceted, multi-disciplined 7-person team, where you are the data engineer talking with data scientists with models needing the right data streams. Your team is made part of a much larger big data practice. Your contributions, accomplishments are tied to specific, and measurable goals and results. The problems are real world problems, and the results are real world results. We seek a data engineer to create python data frameworks.The candidates we seek will have 3+ years of focused work experience programming data frameworks or full stack applications development with Python and will have experience with some of these: Parquet, Delta Lake, Apache Iceberg, data lake, data lakehouse, datalake formats, Dremio Python Pandas, Numpy, Pytest, Scikit-Learn, Tensorflow, Keras, Matplotlib, SpaCy, NLTK, Theano, Pytorch, Caffe, Caffe2 Apache Airflow, Kubernetes, Distributed File Systems, and Massively Parallel Processing (MPP) PySpark, Apache Spark big data analytics, machine learning (ML), Artificial Intelligence (AI) You will be treated as a first class citizen/data engineer and as a valuable data engineering team player, as you provide valuable analytical and technical work including, delivering data streams needed by data scientists; a huge amount of sensor data to work with on this big data practice.- Design and implement reliable python data pipelines to integrate disparate data sources into a single Data Lakehouse or data lake
- Design and implement data quality pipelines to ensure data correctness and building trusted data sets
- Design and implement a Data Lakehouse solution to accurately reflect business operations
- Assist with data platform performance tuning and physical data model design and support including partitioning and compaction or compression of data
- Provide guidance in data visualizations and reporting efforts to ensure solutions are aligned to business objectives
- 3+ years of experience as a Data Engineer designing data pipeline architectures with Python, not necessarily only Python, but other languages, albeit 3 years of Python is required, heavier Python preferred.
- A vast experience in SQL, any SQL, but any of ANSI SQL, PL/SQL, or TSQL, Transact-SQL, stored procedures (Oracle, or SQL Server)
- Experience in various data integration patterns including ETL, ELT, Pub/Sub(publish/subscribe), and Change Data Capture
- Experience in data management practices including data catalog, data lineage, and master data management
- Experience in business analysis and defining business performance metrics
- Experience in software development practices such as Software Design Principles and Software Design Patterns, Testing, CI/CD, and version control
- Knowledgeable of common data visualization tools such as Power BI and Tibco Spotfire or Tableau or other
- Experience in implementing any data lakes is a big plus; or any Data Lake design, Data Lakehouse, Data Lake Use Cases, Data Lake Formats, Dremio or other Apache Iceberg, or Delta Lake