Data Engineer job vacancy

Vacancy expired!

Position: Data Engineer

Location: West Chester, PA

Required Skills Set: Big Data, Hadoop, Spark, Kafka, Python, Java, Scala, Redshift, EC2

Education Required: Bachelor's degree in Information Technology/Electronics/Electrical Engineering/ Computer Science

Years of Experience: 5+ Years

Job Description:

5+ years of experience in a Data Engineer role.
Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Experience building and optimizing 'big data' data pipelines, architectures and data sets.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Strong analytic skills related to working with unstructured datasets.
Ability to build processes supporting data transformation, data structures, metadata, dependency and workload management.
A successful history of manipulating, processing and extracting value from large, disconnected datasets.
Working knowledge of message queuing, stream processing and highly scalable 'big data' data stores.
Experience with big data tools: Hadoop, Spark, Kafka, etc.
Experience with relational SQL and NoSQL databases, including Postgres and Cassandra
Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Experience with AWS cloud services such as EC2, EMR, RDS, Redshift
Experience with stream-processing systems such as Storm, Spark-Streaming, etc.
Experience with object-oriented/object function scripting languages such as Python, Java, C, Scala, etc.

Responsibilities:

Understand the business requirements and prepare the design documents.
Provide Effort estimates and prepare user stories and technical design documents.
Create and maintain optimal data pipeline architecture.
Assemble large, complex data sets that meet functional/non-functional business requirements.
Identify, design and implement internal process improvements such as automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation and loading of data from a wide variety of data sources using SQL and AWS 'big data' technologies.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Implement end to end flow of Continuous Integration and deployment using Build pipeline, Remote SFTP upload, Send Files or Execute Commands over SSH, Sonar plugins in Jenkins.
Involve in data modelling to build canonical model.
Implement data quality checks.
Automate scripts using Perl, Shell to make production system more stable.