Key Responsibilities and Required Skills for Fly Tier

🎯 Role Definition

Fly Tier is a hands-on data engineering role responsible for architecting, building, and maintaining robust, scalable data infrastructure and pipelines that power analytics, reporting, and machine learning. The Fly Tier owns end-to-end data delivery, from ingestion and transformation to monitoring and optimization, collaborating cross-functionally to translate business requirements into production-ready data solutions. This role emphasizes reliability, performance, and data governance while enabling fast experimentation and self-service analytics across the organization.

📈 Career Progression

Typical Career Path

Entry Point From:

Junior Data Engineer / Data Analyst with strong ETL experience and SQL proficiency
BI Engineer or Analytics Engineer transitioning to platform and infrastructure work
Software Engineer with experience in backend systems and data processing

Advancement To:

Senior Data Engineer / Lead Data Engineer
Data Platform Architect
Head of Data Engineering / Data Infrastructure Manager

Lateral Moves:

Machine Learning Engineer (data infrastructure for ML)
Analytics Engineering / BI Lead
Site Reliability Engineer specializing in data platforms

Core Responsibilities

Primary Functions

Design, implement, and maintain reliable, scalable ETL/ELT data pipelines using industry-leading frameworks (e.g., Apache Airflow, dbt, Apache Spark, Flink) to support analytics, reporting, and machine learning workloads across batch and real-time use cases.
Own the end-to-end lifecycle of data ingestion from diverse sources (APIs, event streams, data lakes, transactional databases), ensuring data is captured, validated, transformed, and persisted with strong SLAs and observability.
Build and optimize data models and transformations that enable self-service analytics and accelerate time-to-insight for product, finance, marketing, and operations teams.
Implement data partitioning, partition pruning strategies, and efficient file formats (Parquet/ORC/Avro) to reduce storage costs and improve query performance for large-scale analytical workloads.
Design and maintain data schemas, metadata catalogs, and data dictionaries to improve discoverability, lineage, and governance across the data platform.
Develop and maintain CI/CD pipelines for data code, infrastructure-as-code (Terraform/CloudFormation), and deployment automation to ensure reproducible, auditable, and low-risk releases.
Implement monitoring, alerting, and observability (Datadog/Prometheus/Grafana) for data pipelines, job latency, and SLA adherence; diagnose and resolve production incidents and performance regressions.
Collaborate with data scientists and ML engineers to productionize feature stores, training datasets, model scoring pipelines, and online/offline inference systems with reproducible, versioned data assets.
Optimize distributed compute and storage costs by tuning Spark/Flink jobs, configuring autoscaling, choosing appropriate instance types, and leveraging cloud-native serverless options when appropriate.
Design and enforce data quality checks, validation rules, and anomaly detection for incoming and transformed datasets using framework-agnostic tooling or in-house solutions.
Lead schema evolution strategies and backward/forward compatibility practices for event-driven systems and streaming topics to avoid consumer breakage during product changes.
Partner with product and analytics stakeholders to translate business KPIs into technical requirements, create reliable data contracts, and prioritize data platform roadmap items that deliver measurable business value.
Implement role-based access controls, encryption, and secure data handling procedures to ensure compliance with privacy regulations (GDPR, CCPA) and internal security policies.
Evaluate and integrate new data technologies, managed services, and open-source tools to improve developer productivity, platform reliability, and total cost of ownership.
Create and maintain clear technical documentation, runbooks, and onboarding guides for data consumers, data producers, and engineering peers to reduce knowledge silos and speed up adoption.
Mentor junior data engineers and analytics engineers, conduct code reviews, promote engineering best practices, and contribute to hiring and team-building efforts.
Design and manage data retention, archiving, and lifecycle policies in the data lakehouse and data warehouse to balance regulatory needs and cost-efficiency.
Build robust change-data-capture (CDC) pipelines (Debezium, AWS DMS) to stream transactional updates into analytical systems with low latency and schema consistency.
Implement and maintain partitioned, clustered, and materialized views to accelerate common analytical queries and reduce compute overhead for downstream BI tools.
Lead post-mortems and continuous improvement initiatives for outages and pipeline failures, driving root-cause analysis and long-term remediation plans.
Establish patterns for reproducible data experiments, A/B test logging, and metric reconciliation across distributed systems to ensure experimental rigor and trust in metrics.
Act as a subject matter expert on data engineering standards, advocating for scalable architecture, modular ETL patterns, and reusable transformation libraries.

Secondary Functions

Support ad-hoc data requests and exploratory data analysis to help stakeholders validate hypotheses and unblock decision-making.
Contribute to the organization's data strategy and roadmap, aligning technical investments with company goals and measurable outcomes.
Collaborate with business units to translate data needs into engineering requirements and enforce data contracts and SLAs with partner teams.
Participate in sprint planning and agile ceremonies within the data engineering team, owning deliverables and coordinating cross-team dependencies.
Provide capacity planning and cost forecasting for data workloads, advising on budget trade-offs between performance and cost.
Coordinate cross-functional onboarding sessions and office hours to train internal teams on data platform usage, best practices, and query optimization techniques.
Prototype and validate new ingestion patterns and transformation methodologies to reduce time-to-value for new data sources.
Assist compliance and privacy teams during audits by providing lineage, retention, and access logs for critical data assets.

Required Skills & Competencies

Hard Skills (Technical)

Advanced SQL proficiency for complex analytical queries, window functions, CTEs, query optimization, and performance tuning on modern data warehouses (Snowflake, BigQuery, Redshift).
Strong programming skills in Python and/or Scala/Java for data pipeline development, automation, and unit/integration test coverage.
Experience building production data pipelines using orchestration tools such as Apache Airflow, Prefect, or cloud-native schedulers.
Deep knowledge of distributed data processing frameworks (Apache Spark, Flink) and hands-on experience optimizing jobs, caching, and resource utilization.
Experience with cloud platforms and services (AWS, GCP, Azure) including S3/GCS, IAM, Lambda/Cloud Functions, EMR/Dataproc, Glue, BigQuery, Redshift, Athena.
Familiarity with streaming technologies and message brokers (Kafka, Kinesis, Pub/Sub) and implementing low-latency streaming pipelines and exactly-once semantics where required.
Proficiency with data modeling patterns for analytics (star/snowflake schemas), OLAP systems, and data lakehouse architectures (Delta Lake, Apache Hudi, Iceberg).
Practical experience with ETL/ELT tools and transformation frameworks (dbt, Matillion, Fivetran) and building modular, testable transformation pipelines.
Infrastructure-as-code experience (Terraform, CloudFormation) to provision and manage data infrastructure reliably and reproducibly.
Implementing data quality frameworks, unit testing for data pipelines, and automating validation checks and alerting for data anomalies.
Knowledge of containerization and orchestration (Docker, Kubernetes) for deploying scalable data services and microservices.
Experience setting up logging, monitoring, metrics, and tracing (Prometheus, Grafana, Datadog, OpenTelemetry) for data applications.
Familiarity with data governance, metadata management (Data Catalogs, Amundsen, Atlas), DLP, and privacy-by-design principles to support compliance.
Skilled at SQL-based BI tool integrations (Looker, Tableau, Power BI) and optimizing semantic layers and derived tables for reporting performance.
Understanding of cost optimization techniques for cloud data workloads, including storage formats, partitioning, and compute scaling strategies.

Soft Skills

Excellent stakeholder management with the ability to translate ambiguous business questions into technical solutions and timelines.
Strong written and verbal communication skills for clear documentation, runbooks, and cross-functional collaboration.
Problem-solving mindset with attention to detail, bias for shipping, and a strong sense of ownership for production systems.
Collaborative team player who mentors peers, conducts effective code reviews, and contributes to a culture of engineering excellence.
Prioritization and time-management skills to balance high-impact feature work, operational responsibilities, and technical debt remediation.
Adaptability to rapidly changing business priorities and the ability to learn new technologies and frameworks quickly.
Data-driven decision making and ability to measure impact through SLAs, KPIs, and post-deployment analytics.

Education & Experience

Educational Background

Minimum Education:

Bachelor's degree in Computer Science, Engineering, Mathematics, Statistics, Information Systems, or a related technical field (or equivalent practical experience).

Preferred Education:

Master's degree in Computer Science, Data Science, Engineering, or Business Analytics, or relevant certifications in cloud platforms or data engineering.

Relevant Fields of Study:

Computer Science / Software Engineering
Data Science / Statistics
Information Systems / Data Analytics
Applied Mathematics / Operations Research

Experience Requirements

Typical Experience Range:

3–8 years of professional experience in data engineering, analytics engineering, or backend/data platform development; for senior Fly Tier roles, 6+ years with demonstrable leadership on large-scale data systems.

Preferred:

Demonstrated track record delivering production-grade data pipelines and platform components in a cloud environment, experience mentoring engineers, and driving cross-functional data initiatives that improved business metrics.