Ideal Candidate:An undergraduate or Master’s degree in Computer Science or equivalent engineering experience6+ years of professional software engineering and programming experience (Java, Python) with a focus on designing and developing complex data-intensive applications3+ years of architecture and design (patterns, reliability, scalability, quality) of complex systemsAdvanced coding skills and practices (concurrency, distributed systems, functional principles, performance optimization)Professional experience working in an agile environmentStrong analytical and problem-solving abilityStrong written and verbal communication skillsExperience in operating and maintaining production-grade softwareComfortable with tackling very loosely defined problems and thrive when working on a team which has autonomy in their day to day decisionsPreferred SkillsIn-depth knowledge of software and data engineering best practicesExperience in mentoring and leading junior engineersExperience in serving as the technical lead for complex software development projectsExperience with large-scale distributed data technologies and toolsStrong experience with multiple database models ( relational, document, in-memory, search, etc )Strong experience with Data Streaming Architecture ( Kafka, Spark, Airflow, SQL, NoSQL, CDC, etc )Strong knowledge of cloud data platforms and technologies such as GCS, BigQuery, Cloud Composer, Pub/Sub, Dataflow, Dataproc, Looker, and other cloud-native offeringsStrong Knowledge of Infrastructure as Code (IaC) and associated tools (Terraform, ansible etc)Experience pulling data from a variety of data source types including Mainframe (EBCDIC), Fixed Length and delimited files, databases (SQL, NoSQL, Time-series)Strong coding skills for analytics and data engineering (Java, Python, and Scala)Experience performing analysis with large datasets in a cloud-based environment, preferably with an understanding of Google’s Cloud Platform (GCP)Understands how to translate business requirements to technical architectures and designsComfortable communicating with various stakeholders (technical and non-technical)Experience with Airflow and Spark:Airflow: Proven experience in using Apache Airflow for orchestrating and scheduling workflows. Ability to design, implement, and manage complex data pipelines. Understanding of DAGs (also how to dynamically create them), task dependencies, and error handling within Airflow.Spark: Hands-on experience with Apache Spark for large-scale data processing and analytics. Proficiency in writing Spark jobs in Java (PySpak also fine as were moving in that direction). Also, contains the ability to optimizie performance, and handling data transformations and aggregations at scale.Familiarity with GCP Services:BigQuery: Experience with Google BigQuery for running SQL queries on large datasets, optimizing queries for performance, and in general managing data warehousing solutions.Composer: Knowledge of Google Cloud Composer for managing and orchestrating workflows.Dataproc: Experience with Dataproc for managing and scaling Spark clusters, including configuring clusters, running jobs, and integrating with other GCP services.Proficiency in Python, Java, and SQL:Python: Strong foundation in Python, with experience in writing clean, efficient code and utilizing libraries such as Pandas and NumPy for data manipulation. Proficient in debugging, testing, and using Python for API interactions and external service integration.Java: Proficiency in Java, especially for integrating with data processing frameworks. Experience with Java-based libraries and tools relevant to data engineering is a plus.SQL: Experience in writing and optimizing complex SQL queries for data extraction, transformation, and analysis.Knowledge of Terraform (Optional but Preferred):Terraform: Familiarity with Terraform to automate the provisioning and management of cloud resources. Ability to write and maintain Terraform scripts to define and deploy GCP resources, ensuring infrastructure consistency and scalability.Nice to have Skills (though not required): Exposure to data-science or machine-learning packages (Pandas, Pytorch, Keras, TensorFlow, etc...)Contributions to open-source software (code, docs, or mailing list posts)GCP Professional Data Engineer Certification
Job Title
Senior Data Engineer