Key Responsibilities and Required Skills for Fly Tier
💰 $90,000 - $160,000
Data EngineeringAnalyticsCloudMachine Learning
🎯 Role Definition
Fly Tier is a hands-on data engineering role responsible for architecting, building, and maintaining robust, scalable data infrastructure and pipelines that power analytics, reporting, and machine learning. The Fly Tier owns end-to-end data delivery, from ingestion and transformation to monitoring and optimization, collaborating cross-functionally to translate business requirements into production-ready data solutions. This role emphasizes reliability, performance, and data governance while enabling fast experimentation and self-service analytics across the organization.
📈 Career Progression
Typical Career Path
Entry Point From:
- Junior Data Engineer / Data Analyst with strong ETL experience and SQL proficiency
- BI Engineer or Analytics Engineer transitioning to platform and infrastructure work
- Software Engineer with experience in backend systems and data processing
Advancement To:
- Senior Data Engineer / Lead Data Engineer
- Data Platform Architect
- Head of Data Engineering / Data Infrastructure Manager
Lateral Moves:
- Machine Learning Engineer (data infrastructure for ML)
- Analytics Engineering / BI Lead
- Site Reliability Engineer specializing in data platforms
Core Responsibilities
Primary Functions
- Design, implement, and maintain reliable, scalable ETL/ELT data pipelines using industry-leading frameworks (e.g., Apache Airflow, dbt, Apache Spark, Flink) to support analytics, reporting, and machine learning workloads across batch and real-time use cases.
- Own the end-to-end lifecycle of data ingestion from diverse sources (APIs, event streams, data lakes, transactional databases), ensuring data is captured, validated, transformed, and persisted with strong SLAs and observability.
- Build and optimize data models and transformations that enable self-service analytics and accelerate time-to-insight for product, finance, marketing, and operations teams.
- Implement data partitioning, partition pruning strategies, and efficient file formats (Parquet/ORC/Avro) to reduce storage costs and improve query performance for large-scale analytical workloads.
- Design and maintain data schemas, metadata catalogs, and data dictionaries to improve discoverability, lineage, and governance across the data platform.
- Develop and maintain CI/CD pipelines for data code, infrastructure-as-code (Terraform/CloudFormation), and deployment automation to ensure reproducible, auditable, and low-risk releases.
- Implement monitoring, alerting, and observability (Datadog/Prometheus/Grafana) for data pipelines, job latency, and SLA adherence; diagnose and resolve production incidents and performance regressions.
- Collaborate with data scientists and ML engineers to productionize feature stores, training datasets, model scoring pipelines, and online/offline inference systems with reproducible, versioned data assets.
- Optimize distributed compute and storage costs by tuning Spark/Flink jobs, configuring autoscaling, choosing appropriate instance types, and leveraging cloud-native serverless options when appropriate.
- Design and enforce data quality checks, validation rules, and anomaly detection for incoming and transformed datasets using framework-agnostic tooling or in-house solutions.
- Lead schema evolution strategies and backward/forward compatibility practices for event-driven systems and streaming topics to avoid consumer breakage during product changes.
- Partner with product and analytics stakeholders to translate business KPIs into technical requirements, create reliable data contracts, and prioritize data platform roadmap items that deliver measurable business value.
- Implement role-based access controls, encryption, and secure data handling procedures to ensure compliance with privacy regulations (GDPR, CCPA) and internal security policies.
- Evaluate and integrate new data technologies, managed services, and open-source tools to improve developer productivity, platform reliability, and total cost of ownership.
- Create and maintain clear technical documentation, runbooks, and onboarding guides for data consumers, data producers, and engineering peers to reduce knowledge silos and speed up adoption.
- Mentor junior data engineers and analytics engineers, conduct code reviews, promote engineering best practices, and contribute to hiring and team-building efforts.
- Design and manage data retention, archiving, and lifecycle policies in the data lakehouse and data warehouse to balance regulatory needs and cost-efficiency.
- Build robust change-data-capture (CDC) pipelines (Debezium, AWS DMS) to stream transactional updates into analytical systems with low latency and schema consistency.
- Implement and maintain partitioned, clustered, and materialized views to accelerate common analytical queries and reduce compute overhead for downstream BI tools.
- Lead post-mortems and continuous improvement initiatives for outages and pipeline failures, driving root-cause analysis and long-term remediation plans.
- Establish patterns for reproducible data experiments, A/B test logging, and metric reconciliation across distributed systems to ensure experimental rigor and trust in metrics.
- Act as a subject matter expert on data engineering standards, advocating for scalable architecture, modular ETL patterns, and reusable transformation libraries.
Secondary Functions
- Support ad-hoc data requests and exploratory data analysis to help stakeholders validate hypotheses and unblock decision-making.
- Contribute to the organization's data strategy and roadmap, aligning technical investments with company goals and measurable outcomes.
- Collaborate with business units to translate data needs into engineering requirements and enforce data contracts and SLAs with partner teams.
- Participate in sprint planning and agile ceremonies within the data engineering team, owning deliverables and coordinating cross-team dependencies.
- Provide capacity planning and cost forecasting for data workloads, advising on budget trade-offs between performance and cost.
- Coordinate cross-functional onboarding sessions and office hours to train internal teams on data platform usage, best practices, and query optimization techniques.
- Prototype and validate new ingestion patterns and transformation methodologies to reduce time-to-value for new data sources.
- Assist compliance and privacy teams during audits by providing lineage, retention, and access logs for critical data assets.
Required Skills & Competencies
Hard Skills (Technical)
- Advanced SQL proficiency for complex analytical queries, window functions, CTEs, query optimization, and performance tuning on modern data warehouses (Snowflake, BigQuery, Redshift).
- Strong programming skills in Python and/or Scala/Java for data pipeline development, automation, and unit/integration test coverage.
- Experience building production data pipelines using orchestration tools such as Apache Airflow, Prefect, or cloud-native schedulers.
- Deep knowledge of distributed data processing frameworks (Apache Spark, Flink) and hands-on experience optimizing jobs, caching, and resource utilization.
- Experience with cloud platforms and services (AWS, GCP, Azure) including S3/GCS, IAM, Lambda/Cloud Functions, EMR/Dataproc, Glue, BigQuery, Redshift, Athena.
- Familiarity with streaming technologies and message brokers (Kafka, Kinesis, Pub/Sub) and implementing low-latency streaming pipelines and exactly-once semantics where required.
- Proficiency with data modeling patterns for analytics (star/snowflake schemas), OLAP systems, and data lakehouse architectures (Delta Lake, Apache Hudi, Iceberg).
- Practical experience with ETL/ELT tools and transformation frameworks (dbt, Matillion, Fivetran) and building modular, testable transformation pipelines.
- Infrastructure-as-code experience (Terraform, CloudFormation) to provision and manage data infrastructure reliably and reproducibly.
- Implementing data quality frameworks, unit testing for data pipelines, and automating validation checks and alerting for data anomalies.
- Knowledge of containerization and orchestration (Docker, Kubernetes) for deploying scalable data services and microservices.
- Experience setting up logging, monitoring, metrics, and tracing (Prometheus, Grafana, Datadog, OpenTelemetry) for data applications.
- Familiarity with data governance, metadata management (Data Catalogs, Amundsen, Atlas), DLP, and privacy-by-design principles to support compliance.
- Skilled at SQL-based BI tool integrations (Looker, Tableau, Power BI) and optimizing semantic layers and derived tables for reporting performance.
- Understanding of cost optimization techniques for cloud data workloads, including storage formats, partitioning, and compute scaling strategies.
Soft Skills
- Excellent stakeholder management with the ability to translate ambiguous business questions into technical solutions and timelines.
- Strong written and verbal communication skills for clear documentation, runbooks, and cross-functional collaboration.
- Problem-solving mindset with attention to detail, bias for shipping, and a strong sense of ownership for production systems.
- Collaborative team player who mentors peers, conducts effective code reviews, and contributes to a culture of engineering excellence.
- Prioritization and time-management skills to balance high-impact feature work, operational responsibilities, and technical debt remediation.
- Adaptability to rapidly changing business priorities and the ability to learn new technologies and frameworks quickly.
- Data-driven decision making and ability to measure impact through SLAs, KPIs, and post-deployment analytics.
Education & Experience
Educational Background
Minimum Education:
- Bachelor's degree in Computer Science, Engineering, Mathematics, Statistics, Information Systems, or a related technical field (or equivalent practical experience).
Preferred Education:
- Master's degree in Computer Science, Data Science, Engineering, or Business Analytics, or relevant certifications in cloud platforms or data engineering.
Relevant Fields of Study:
- Computer Science / Software Engineering
- Data Science / Statistics
- Information Systems / Data Analytics
- Applied Mathematics / Operations Research
Experience Requirements
Typical Experience Range:
- 3–8 years of professional experience in data engineering, analytics engineering, or backend/data platform development; for senior Fly Tier roles, 6+ years with demonstrable leadership on large-scale data systems.
Preferred:
- Demonstrated track record delivering production-grade data pipelines and platform components in a cloud environment, experience mentoring engineers, and driving cross-functional data initiatives that improved business metrics.