Data Warehouse (DWH) Design for a Large IT Company

We created an analytical environment to automate the collection, processing, and visualization of information. This included designing a Data Warehouse (DWH), configuring ETL processes in Airflow, and developing dashboards in Superset.

Customer

Large IT company.

Task 

To automate internal work processes, the client needed to create an analytical environment, a set of software tools that would simplify the collection, processing, and visualization of data.

Business Goals

  • Obtain a tool for quick data collection and analysis.

  • Create a single reliable source of data for all users.

  • Train users to work with data analytics systems.

Technical goals

  • Design a scalable DWH architecture.
  • Automate the loading and transformation of data from various sources.
  • Ensure data security through a role‑based access model.
  • Optimize query performance for large data volumes.

15+
DAG in Airflow
100%
elimination of manual data processing errors

Solution

1. Analysis and Design
In three weeks, we analyzed the infrastructure of other companies in the market, researched the necessary IT products in the system, and studied the market for the most effective solutions. Based on the collected data, we prepared a technical specification for DevOps engineers to deploy the required infrastructure.

We selected the optimal solutions: a database management system (DBMS), ETL tools, BI systems, and additional software products. We chose Greenplum for storing historical data and ClickHouse for real‑time analytics.

ClickHouse is currently one of the fastest analytical systems, reducing query execution time severalfold.

2. ETL Process Development

We developed ETL (Extract, Transform, Load) processes to extract data from various sources, transform it into a format suitable for analytics, and load it into the target system.

Over the next six weeks, we created more than 15 DAGs in Airflow. DAGs, or Directed Acyclic Graphs, define the logic and sequence of extracting, transforming, and loading data. For complex transformations, we implemented data processing using Python (Pandas) and set up monitoring and failure notifications.

Automation through ETL reduced errors from manual data processing to zero.

3. Visualization

Over the following four weeks, we developed and refined dashboards for key metrics in Superset.

4. Implementation

In two weeks, we conducted four training webinars for more than 20 employees and set up role‑based access to the system.

Result

The client received a ready‑to‑use analytical system designed to work with various data formats and sources.

Services provided:

  •  Gathering requirements for the analytics platform

  • Designing the DWH architecture (Greenplum, ClickHouse)

  • Developing ETL pipelines in Airflow and writing DAGs in Python 

  • Training the team to work with the new tools

  • Implementing version control with GitLab

  • Developing interactive dashboards in Superset

  • Integrating with external APIs

  • Creating data marts and optimizing SQL queries

SimbirSoft project team included:

  • 6 Data Analysts

  • 2 System Analysts

  • 2 DWH Analysts

  • 1 DevOps Engine

Technologies 

  • Databases: Greenplum, ClickHouse
  • ETL: Airflow (Python), Pandas
  • Visualization: Superset
  • Infrastructure: Docker, k9s
  • Version control: GitLab
  • IDE: PyCharm, DBeaver

Other cases
Warehouse Management System (WMS) Audit in 10 Days
Mobile App for Yugoria Insurance Company
Magnit Delivery: IT System Quality Assurance
Designing a Mobile App for ViewEvo
Warehouse Management System (WMS) Audit in 10 Days
Mobile App for Yugoria Insurance Company
Magnit Delivery: IT System Quality Assurance
Designing a Mobile App for ViewEvo
Send us your request
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Tell us your idea
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Оставьте свои контакты
SimbirSoft регулярно расширяет штат сотрудников.
Отправьте контакты, чтобы обсудить условия сотрудничества.
Написать нам
Please tell us about the tasks currently on your project.
We will offer expert consultation, recommend qualified specialists, and provide an overview of outstaffing rates.
Field of Expertise
Number of Specialists
Middle
TeamLead
Senior
TechLead
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Express сonsultation
Please fill out all fields in the form.
An expert will contact you within the working day.
Field of Expertise
Attach a file (up to 10MB)
File selected
Можно прикрепить один файл в формате: txt, doc, docx, odt, xls, xlsx, pdf, jpg, jpeg, png.

Размер файла до 10 Мб.
Порекомендуйте друга — получите вознаграждение!
Прикрепить резюме, до 10Мб
Файл выбран
Можно прикрепить один файл в формате: txt, doc, docx, odt, xls, xlsx, pdf, jpg, jpeg, png.

Размер файла до 10 Мб.
Заказать демонстрацию
Оставьте контакты, чтобы обсудить проект и условия
сотрудничества, или позвоните: 8 800 200-99-24
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB