Data Architect - Certifications, Portfolio, Leadership | 20 sections de reference rapide
1. Certifications - Comparatif Complet
| Certification | Organisme | Prix | Duree | Questions | Score min | Difficulte | ROI Marche | Priorite |
| CDMP Fundamentals | DAMA International | $311 | 90 min | 100 QCM | 400/700 (Associate) | Elevee | Tres eleve | P1 |
| SnowPro Core (COF-C02) | Snowflake | $175 | 115 min | 100 QCM | 750/1000 | Moyenne | Eleve | P1 |
| GCP Prof. Data Engineer | Google Cloud | $200 | 120 min | 50-60 QCM | ~70% | Elevee | Tres eleve | P1 |
| Databricks DE Associate | Databricks | $200 | 90 min | 45 QCM | 70% | Moyenne | Eleve | P2 |
| AWS Data Engineer Associate | AWS | $150 | 170 min | 65 QCM | 720/1000 | Elevee | Eleve | P2 |
| Confluent Kafka Dev | Confluent | $150 | 90 min | 60 QCM | 70% | Moyenne | Moyen | P3 |
| dbt Analytics Engineer | dbt Labs | Gratuit | Libre | 65 QCM | 63% | Faible | Moyen | P3 |
Strategie optimale : CDMP d'abord (vendor-agnostic, couvre tout le DMBOK), puis 1 certif cloud alignee avec votre marche cible (Snowflake + GCP = combo puissant en France).
2. CDMP - 14 Domaines DMBOK
| # | Domaine | Poids |
| 1 | Data Governance (central) | ~11% |
| 2 | Data Architecture | ~11% |
| 3 | Data Modeling & Design | ~11% |
| 4 | Data Storage & Operations | ~6% |
| 5 | Data Security | ~6% |
| 6 | Data Integration & Interop | ~6% |
| 7 | Document & Content Mgmt | ~6% |
| 8 | Reference & Master Data | ~6% |
| 9 | Data Warehousing & BI | ~6% |
| 10 | Metadata Management | ~6% |
| 11 | Data Quality | ~6% |
| 12 | Big Data & Data Science | ~6% |
| 13 | Data Management Process | ~6% |
| 14 | Data Ethics | ~6% |
Niveaux : Associate ≥ 400, Practitioner ≥ 500, Master ≥ 600 (sur 700)
3. Snowflake SnowPro Core
| Domaine | Poids |
| Snowflake Cloud Data Platform Features | 25% |
| Account Access & Security | 20% |
| Performance Concepts | 15% |
| Data Loading & Unloading | 10% |
| Data Transformation | 20% |
| Data Protection & Sharing | 10% |
Pieges courants
- Time Travel : 1 jour (Standard), 90 jours (Enterprise)
- Clustering : seulement tables > 1TB, pas de cluster key sur colonnes haute cardinalite + filtrage rare
- Warehouses : auto-suspend min 1 min, scaling policy Standard vs Economy
- Zero-copy clone : ne clone PAS les privileges
4. GCP Prof. Data Engineer
Services cles a maitriser
| Besoin | Service GCP |
| Data Warehouse | BigQuery |
| Stream Processing | Dataflow (Apache Beam) |
| Batch ETL | Dataproc (Spark) / Dataflow |
| Messaging | Pub/Sub |
| Orchestration | Cloud Composer (Airflow) |
| ML Platform | Vertex AI |
| Storage | Cloud Storage (GCS) |
| NoSQL | Bigtable / Firestore |
Decision tree : Donnees structurees → BigQuery. Stream → Pub/Sub + Dataflow. ML → Vertex AI. Hadoop legacy → Dataproc.
5. Databricks & AWS Certifications
Databricks DE Associate - Domaines
- Databricks Lakehouse Platform (24%)
- ELT with Spark SQL & Python (29%)
- Incremental Data Processing (22%)
- Production Pipelines (25%)
AWS Data Engineer Associate - Domaines
- Data Ingestion & Transformation (34%)
- Data Store Management (26%)
- Data Operations & Support (22%)
- Data Security & Governance (18%)
Conseil : Pour Databricks, maitrisez Delta Live Tables et Unity Catalog. Pour AWS, focalisez sur Glue, Redshift, Kinesis et Lake Formation.
6. Confluent & dbt Certifications
Confluent Certified Developer
- Kafka Fundamentals (15%)
- Kafka Architecture (20%)
- Producers (20%)
- Consumers (20%)
- Kafka Streams & ksqlDB (15%)
- Connect & Schema Registry (10%)
dbt Analytics Engineering
- dbt Models & Materializations
- Tests & Documentation
- Jinja & Macros
- Packages & Seeds
- Snapshots & Incremental
dbt est gratuit et peut etre passe en ligne a votre rythme.
7. Strategies de Preparation
Plan type (6 semaines)
| Semaine | Activite |
| 1-2 | Etude du guide officiel + notes |
| 3-4 | Labs pratiques + exercices |
| 5 | Mock exams (viser > 80%) |
| 6 | Revision points faibles + exam |
Techniques efficaces
- Spaced Repetition : Anki avec flashcards personnalisees
- Active Recall : fermer le livre, ecrire de memoire
- Pomodoro : 25 min focus + 5 min pause
- Feynman : expliquer a un debutant pour verifier la comprehension
- Practice tests : minimum 3 mock exams complets avant le vrai
Regle des 80/20 : 80% du score vient de 20% des sujets. Identifiez les domaines a fort poids et concentrez-y votre effort.
8. Portfolio Data Architect
Checklist du portfolio ideal
| Element | Minimum | Ideal |
| Projets GitHub | 2 repos | 4-5 repos documentes |
| Architecture diagrams | C4 Level 1 | C4 L1-L3 + decision records |
| Blog articles | 3 articles | 1-2 par mois reguliers |
| Certifications | 1 certif | 2-3 certifs strategiques |
| Contributions OSS | 1 PR merged | 5+ PRs, maintainer status |
| Talks / Meetups | 0 | 2+ talks par an |
Structure README type
# Project Name
## Architecture Overview (C4 diagram)
## Tech Stack & Justification
## Setup & Installation
## Data Flow Description
## Key Design Decisions (ADR links)
## Results & Metrics
## Lessons Learned
9. Projet E2E - Architecture de Reference
Sources Ingestion Storage Transform Serving
═══════ ═════════ ═══════ ═════════ ═══════
┌──────────┐ ┌─────────────┐ ┌────────────┐ ┌───────────┐ ┌──────────┐
│ REST API │──────────│ │ │ │ │ │ │ Metabase │
└──────────┘ │ Airbyte / │ │ S3 / GCS │ │ dbt │ │ Superset │
┌──────────┐ │ Fivetran │───────→│ (Iceberg) │──────→│ Medallion │─────→│ Looker │
│ Database │──────────│ │ │ │ │ │ │ │
└──────────┘ └─────────────┘ │ Bronze → │ │ Silver → │ └──────────┘
┌──────────┐ ┌─────────────┐ │ Raw data │ │ Gold │ ┌──────────┐
│ CSV/JSON │──────────│ Custom │───────→│ │ │ │ │ Data │
│ Files │ │ Python │ └────────────┘ └───────────┘ │ Quality │
└──────────┘ └─────────────┘ │ │ (GX) │
│ │ └──────────┘
┌──────▼──────┐ ┌──────▼──────┐
│ Airflow / │ Orchestration │ Tests dbt │
│ Dagster │◄──────────────────────────→│ CI/CD │
└─────────────┘ └─────────────┘
Stack recommandee : Airbyte + S3/Iceberg + dbt + Airflow + Great Expectations + Metabase | Duree : 6-8 semaines
10. Real-time Pipeline - Stack
Producers → Kafka → Flink/Spark → ClickHouse → Dashboard
│ │ │ │ │
└──────────┴─────────┴──────────────┴────────────┘
Schema Registry Alerting
(Avro/Protobuf) (Grafana)
Composants
| Couche | Outil | Alternative |
| Messaging | Apache Kafka | Redpanda, Pulsar |
| Processing | Apache Flink | Spark Streaming, Beam |
| OLAP Store | ClickHouse | Apache Druid, Pinot |
| Dashboard | Grafana | Superset, custom |
| Alerting | Grafana Alerts | PagerDuty, OpsGenie |
11. Data Mesh - Implementation
4 Principes fondamentaux
- Domain Ownership : chaque domaine possede ses donnees
- Data as a Product : SLAs, qualite, documentation
- Self-serve Platform : infrastructure en libre-service
- Federated Governance : standards globaux, autonomie locale
Centralise vs Mesh
| Aspect | Centralise | Data Mesh |
| Ownership | Equipe data centrale | Domaines metier |
| Scalabilite | Bottleneck central | Scalable par domaine |
| Governance | Centralisee | Federee |
| Complexite | Simple a demarrer | Necessite maturite org |
| Taille ideale | < 100 personnes data | > 100 personnes data |
12. Contribution Open Source
Workflow de contribution
1. Fork le repo sur GitHub
2. git clone + branch feature
3. Code + tests + docs
4. git push + Pull Request
5. Code review + iterations
6. Merge (maintainer)
Ou contribuer (Data)
- dbt : packages (dbt-utils, dbt-expectations), adapters
- Airflow : providers, operators, hooks
- Great Expectations : custom expectations
- Airbyte : connecteurs source/destination
- Apache Iceberg : docs, bug fixes
Conseil : Commencez par les issues "good first issue" et les ameliorations de documentation. C'est le moyen le plus rapide d'etre accepte par les maintainers.
13. C4 Model - Documentation Architecture
| Niveau | Nom | Audience | Contenu |
| 1 | Context | COMEX, PM, tous | Systeme dans son environnement |
| 2 | Container | Tech Leads, Architects | Apps, DBs, queues, APIs |
| 3 | Component | Developpeurs | Composants internes d'un container |
| 4 | Code | Developpeurs | Classes, interfaces (rarement) |
Outils
- Structurizr : DSL + rendu automatique (recommande)
- PlantUML : C4-PlantUML extension
- Mermaid : integre dans GitHub Markdown
- draw.io : drag-and-drop, shapes AWS/GCP/Azure
Structurizr DSL
workspace {
model {
user = person "Data Analyst"
platform = softwareSystem "Data Platform" {
ingestion = container "Airbyte"
warehouse = container "Snowflake"
transform = container "dbt"
bi = container "Metabase"
}
user -> bi "Consulte dashboards"
bi -> warehouse "SQL queries"
transform -> warehouse "Transformations"
ingestion -> warehouse "Raw data load"
}
views {
systemContext platform "Context" {
include *
autoLayout
}
container platform "Containers" {
include *
autoLayout
}
}
}
14. ADR - Architecture Decision Record
# ADR-NNN : [Titre de la decision]
## Status
[Proposed | Accepted | Deprecated | Superseded]
## Date
YYYY-MM-DD
## Context
Probleme business et technique a resoudre.
Contraintes : budget, timeline, equipe, legacy.
## Decision
La decision prise avec justification.
## Alternatives considerees
| Option | Avantages | Inconvenients | Rejetee car |
|--------|-----------|---------------|-------------|
| Option A | ... | ... | ... |
| Option B | ... | ... | ... |
## Consequences
### Positives
+ [Avantage 1]
+ [Avantage 2]
### Negatives
- [Inconvenient 1]
- [Risque mitige par ...]
## Related ADRs
- ADR-XXX (depend de)
- ADR-YYY (remplace)
Outil CLI : npm install -g adr-log pour generer automatiquement un index des ADRs. Stockez-les dans docs/adr/ du repo.
15. Stakeholder Management
Matrice Pouvoir / Interet
Haut Interet Bas Interet
┌─────────────────┬─────────────────┐
Haut │ GERER ETROIT │ SATISFAIRE │
Pouvoir│ (CEO, CTO, │ (CFO, Legal, │
│ VP Data) │ Compliance) │
├─────────────────┼─────────────────┤
Bas │ INFORMER │ MONITORER │
Pouvoir│ (Data Analysts, │ (Dev Teams, │
│ Data Scientists│ Ops Teams) │
└─────────────────┴─────────────────┘
Communication par audience
| Audience | Format | Frequence | Contenu |
| Board / C-level | 3 slides max | Trimestriel | ROI, risques, roadmap |
| VP / Directors | Dashboard + recap | Mensuel | KPIs, progression, blockers |
| Tech Leads | ADR + diagrams | Hebdo | Decisions tech, trade-offs |
| Data Team | Stand-up + wiki | Quotidien | Tasks, impediments |
16. ROI Data Platform - Formules
ROI = (Gains annuels - Cout total) / Cout total x 100
Gains typiques :
Reduction temps reporting : heures x cout_horaire x nb_analystes
Reduction churn client : % amelioration x revenu_moyen x nb_clients
Automatisation : nb_processus x temps_manuel_economise
Time-to-insight : jours_avant - jours_apres → valeur business
Couts typiques :
Licences : Snowflake, dbt Cloud, outils BI
Infra cloud : compute + storage + network
Equipe : salaires + formation + recrutement
Implementation : consulting si externe
Presentation COMEX - Template 3 slides
- Slide 1 - Le Probleme : "Nous perdons X EUR/an parce que..." (business, pas technique)
- Slide 2 - La Solution : 1 diagramme C4 Level 1, 3 bullet points max
- Slide 3 - Le ROI : Investissement X EUR, retour Y EUR/an, payback Z mois
17. Data Team - Roles & Structure
| Role | Focus | Outils principaux | Output | Salaire FR (2026) |
| Data Engineer | Pipelines, infra data | Python, Spark, Airflow, dbt | Pipelines fiables, data quality | 50-85K EUR |
| Analytics Engineer | Transformation, modeling | dbt, SQL, Git | Modeles analytiques, docs | 50-80K EUR |
| Data Analyst | Analyse, reporting | SQL, BI tools, Excel | Dashboards, insights | 40-65K EUR |
| Data Scientist | ML, statistiques | Python, Jupyter, sklearn | Modeles predictifs | 55-90K EUR |
| Data Architect | Design, strategie | C4, ADR, multi-cloud | Architecture, standards | 70-130K EUR |
| Platform Engineer | Infrastructure data | Terraform, K8s, Docker | Self-serve platform | 55-90K EUR |
Ratios recommandes
| Taille entreprise | DE | AE | DA | DS | Architect |
| Startup (< 100) | 1-2 | 1 | 1-2 | 0-1 | 0 (DE cumule) |
| Mid-Market (100-1K) | 3-5 | 2-3 | 3-5 | 1-3 | 1 |
| Enterprise (1K+) | 5-15 | 3-8 | 10+ | 5+ | 2-3 |
18. Vendor Evaluation - Scoring Matrix
| Critere | Poids | Score (1-5) | Pondere |
| Fonctionnalites | 25% | _ | _ |
| Performance / Scalabilite | 20% | _ | _ |
| Cout (TCO sur 3 ans) | 20% | _ | _ |
| Facilite d'integration | 15% | _ | _ |
| Support & Communaute | 10% | _ | _ |
| Lock-in risk | 10% | _ | _ |
Build vs Buy
| Facteur | Favorise Build | Favorise Buy |
| Core business | Avantage competitif | Commodity |
| Equipe | Expertise interne forte | Equipe limitee |
| Timeline | Pas d'urgence | Time-to-market critique |
| Budget | CapEx prefere | OpEx prefere |
| Maintenance | Capacite long terme | Prefere deleguer |
19. Data Strategy Roadmap - Template
Structure en 4 phases
| Phase | Duree | Focus | Livrables |
| Discovery | 2-4 sem | Assessment etat actuel | Audit data, interviews, pain points |
| Design | 4-6 sem | Architecture cible | ADRs, C4 diagrams, budget |
| Build | 3-6 mois | Implementation MVP | Platform v1, 2-3 use cases |
| Scale | 6-12 mois | Adoption, optimisation | Self-serve, FinOps, governance |
Frameworks de priorisation
- RICE : Reach x Impact x Confidence / Effort
- MoSCoW : Must / Should / Could / Won't
- Value vs Effort : matrice 2x2 (quick wins en premier)
Regle d'or : Commencez par 1-2 use cases a haute valeur et faible effort (quick wins) pour demontrer la valeur de la data platform avant d'investir dans l'infrastructure lourde.
20. RACI & Checklist Leadership
Matrice RACI Data Platform
| Activite | Data Architect | DE Lead | PM | Business |
| Architecture design | R | C | I | I |
| Tech decisions (ADR) | A | R | I | I |
| Pipeline development | C | R | A | I |
| Data quality rules | C | R | I | A |
| Budget & roadmap | R | C | A | I |
| Vendor selection | R | C | A | I |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Checklist Data Architect Leader
- ADRs a jour pour chaque decision majeure
- Architecture Review Board mensuel
- Data quality dashboard visible par tous
- Roadmap data mise a jour trimestriellement
- 1:1 hebdo avec chaque tech lead
- Blog interne / newsletter data mensuel
- Skills matrix de l'equipe a jour
- Budget FinOps revise mensuellement