Back to Home

Key Responsibilities and Required Skills for Fly Tier

💰 $90,000 - $160,000

Data EngineeringAnalyticsCloudMachine Learning

🎯 Role Definition

Fly Tier is a hands-on data engineering role responsible for architecting, building, and maintaining robust, scalable data infrastructure and pipelines that power analytics, reporting, and machine learning. The Fly Tier owns end-to-end data delivery, from ingestion and transformation to monitoring and optimization, collaborating cross-functionally to translate business requirements into production-ready data solutions. This role emphasizes reliability, performance, and data governance while enabling fast experimentation and self-service analytics across the organization.


📈 Career Progression

Typical Career Path

Entry Point From:

  • Junior Data Engineer / Data Analyst with strong ETL experience and SQL proficiency
  • BI Engineer or Analytics Engineer transitioning to platform and infrastructure work
  • Software Engineer with experience in backend systems and data processing

Advancement To:

  • Senior Data Engineer / Lead Data Engineer
  • Data Platform Architect
  • Head of Data Engineering / Data Infrastructure Manager

Lateral Moves:

  • Machine Learning Engineer (data infrastructure for ML)
  • Analytics Engineering / BI Lead
  • Site Reliability Engineer specializing in data platforms

Core Responsibilities

Primary Functions

  • Design, implement, and maintain reliable, scalable ETL/ELT data pipelines using industry-leading frameworks (e.g., Apache Airflow, dbt, Apache Spark, Flink) to support analytics, reporting, and machine learning workloads across batch and real-time use cases.
  • Own the end-to-end lifecycle of data ingestion from diverse sources (APIs, event streams, data lakes, transactional databases), ensuring data is captured, validated, transformed, and persisted with strong SLAs and observability.
  • Build and optimize data models and transformations that enable self-service analytics and accelerate time-to-insight for product, finance, marketing, and operations teams.
  • Implement data partitioning, partition pruning strategies, and efficient file formats (Parquet/ORC/Avro) to reduce storage costs and improve query performance for large-scale analytical workloads.
  • Design and maintain data schemas, metadata catalogs, and data dictionaries to improve discoverability, lineage, and governance across the data platform.
  • Develop and maintain CI/CD pipelines for data code, infrastructure-as-code (Terraform/CloudFormation), and deployment automation to ensure reproducible, auditable, and low-risk releases.
  • Implement monitoring, alerting, and observability (Datadog/Prometheus/Grafana) for data pipelines, job latency, and SLA adherence; diagnose and resolve production incidents and performance regressions.
  • Collaborate with data scientists and ML engineers to productionize feature stores, training datasets, model scoring pipelines, and online/offline inference systems with reproducible, versioned data assets.
  • Optimize distributed compute and storage costs by tuning Spark/Flink jobs, configuring autoscaling, choosing appropriate instance types, and leveraging cloud-native serverless options when appropriate.
  • Design and enforce data quality checks, validation rules, and anomaly detection for incoming and transformed datasets using framework-agnostic tooling or in-house solutions.
  • Lead schema evolution strategies and backward/forward compatibility practices for event-driven systems and streaming topics to avoid consumer breakage during product changes.
  • Partner with product and analytics stakeholders to translate business KPIs into technical requirements, create reliable data contracts, and prioritize data platform roadmap items that deliver measurable business value.
  • Implement role-based access controls, encryption, and secure data handling procedures to ensure compliance with privacy regulations (GDPR, CCPA) and internal security policies.
  • Evaluate and integrate new data technologies, managed services, and open-source tools to improve developer productivity, platform reliability, and total cost of ownership.
  • Create and maintain clear technical documentation, runbooks, and onboarding guides for data consumers, data producers, and engineering peers to reduce knowledge silos and speed up adoption.
  • Mentor junior data engineers and analytics engineers, conduct code reviews, promote engineering best practices, and contribute to hiring and team-building efforts.
  • Design and manage data retention, archiving, and lifecycle policies in the data lakehouse and data warehouse to balance regulatory needs and cost-efficiency.
  • Build robust change-data-capture (CDC) pipelines (Debezium, AWS DMS) to stream transactional updates into analytical systems with low latency and schema consistency.
  • Implement and maintain partitioned, clustered, and materialized views to accelerate common analytical queries and reduce compute overhead for downstream BI tools.
  • Lead post-mortems and continuous improvement initiatives for outages and pipeline failures, driving root-cause analysis and long-term remediation plans.
  • Establish patterns for reproducible data experiments, A/B test logging, and metric reconciliation across distributed systems to ensure experimental rigor and trust in metrics.
  • Act as a subject matter expert on data engineering standards, advocating for scalable architecture, modular ETL patterns, and reusable transformation libraries.

Secondary Functions

  • Support ad-hoc data requests and exploratory data analysis to help stakeholders validate hypotheses and unblock decision-making.
  • Contribute to the organization's data strategy and roadmap, aligning technical investments with company goals and measurable outcomes.
  • Collaborate with business units to translate data needs into engineering requirements and enforce data contracts and SLAs with partner teams.
  • Participate in sprint planning and agile ceremonies within the data engineering team, owning deliverables and coordinating cross-team dependencies.
  • Provide capacity planning and cost forecasting for data workloads, advising on budget trade-offs between performance and cost.
  • Coordinate cross-functional onboarding sessions and office hours to train internal teams on data platform usage, best practices, and query optimization techniques.
  • Prototype and validate new ingestion patterns and transformation methodologies to reduce time-to-value for new data sources.
  • Assist compliance and privacy teams during audits by providing lineage, retention, and access logs for critical data assets.

Required Skills & Competencies

Hard Skills (Technical)

  • Advanced SQL proficiency for complex analytical queries, window functions, CTEs, query optimization, and performance tuning on modern data warehouses (Snowflake, BigQuery, Redshift).
  • Strong programming skills in Python and/or Scala/Java for data pipeline development, automation, and unit/integration test coverage.
  • Experience building production data pipelines using orchestration tools such as Apache Airflow, Prefect, or cloud-native schedulers.
  • Deep knowledge of distributed data processing frameworks (Apache Spark, Flink) and hands-on experience optimizing jobs, caching, and resource utilization.
  • Experience with cloud platforms and services (AWS, GCP, Azure) including S3/GCS, IAM, Lambda/Cloud Functions, EMR/Dataproc, Glue, BigQuery, Redshift, Athena.
  • Familiarity with streaming technologies and message brokers (Kafka, Kinesis, Pub/Sub) and implementing low-latency streaming pipelines and exactly-once semantics where required.
  • Proficiency with data modeling patterns for analytics (star/snowflake schemas), OLAP systems, and data lakehouse architectures (Delta Lake, Apache Hudi, Iceberg).
  • Practical experience with ETL/ELT tools and transformation frameworks (dbt, Matillion, Fivetran) and building modular, testable transformation pipelines.
  • Infrastructure-as-code experience (Terraform, CloudFormation) to provision and manage data infrastructure reliably and reproducibly.
  • Implementing data quality frameworks, unit testing for data pipelines, and automating validation checks and alerting for data anomalies.
  • Knowledge of containerization and orchestration (Docker, Kubernetes) for deploying scalable data services and microservices.
  • Experience setting up logging, monitoring, metrics, and tracing (Prometheus, Grafana, Datadog, OpenTelemetry) for data applications.
  • Familiarity with data governance, metadata management (Data Catalogs, Amundsen, Atlas), DLP, and privacy-by-design principles to support compliance.
  • Skilled at SQL-based BI tool integrations (Looker, Tableau, Power BI) and optimizing semantic layers and derived tables for reporting performance.
  • Understanding of cost optimization techniques for cloud data workloads, including storage formats, partitioning, and compute scaling strategies.

Soft Skills

  • Excellent stakeholder management with the ability to translate ambiguous business questions into technical solutions and timelines.
  • Strong written and verbal communication skills for clear documentation, runbooks, and cross-functional collaboration.
  • Problem-solving mindset with attention to detail, bias for shipping, and a strong sense of ownership for production systems.
  • Collaborative team player who mentors peers, conducts effective code reviews, and contributes to a culture of engineering excellence.
  • Prioritization and time-management skills to balance high-impact feature work, operational responsibilities, and technical debt remediation.
  • Adaptability to rapidly changing business priorities and the ability to learn new technologies and frameworks quickly.
  • Data-driven decision making and ability to measure impact through SLAs, KPIs, and post-deployment analytics.

Education & Experience

Educational Background

Minimum Education:

  • Bachelor's degree in Computer Science, Engineering, Mathematics, Statistics, Information Systems, or a related technical field (or equivalent practical experience).

Preferred Education:

  • Master's degree in Computer Science, Data Science, Engineering, or Business Analytics, or relevant certifications in cloud platforms or data engineering.

Relevant Fields of Study:

  • Computer Science / Software Engineering
  • Data Science / Statistics
  • Information Systems / Data Analytics
  • Applied Mathematics / Operations Research

Experience Requirements

Typical Experience Range:

  • 3–8 years of professional experience in data engineering, analytics engineering, or backend/data platform development; for senior Fly Tier roles, 6+ years with demonstrable leadership on large-scale data systems.

Preferred:

  • Demonstrated track record delivering production-grade data pipelines and platform components in a cloud environment, experience mentoring engineers, and driving cross-functional data initiatives that improved business metrics.