Creation of an IT Data Processing System for the Genealogical Sector

Design, development, and support of an internal automated system for creating configurable and production-ready data processing pipelines with integrated machine learning models. The solution ensures scalability, fault tolerance, and management of the complete data lifecycle — from ingestion and preprocessing to inference and monitoring.

Customer

The client is a company specializing in genealogical research, processing, and providing access to historical archival data.

Task  

The SimbirSoft team faced the following challenges:

  • Minimize errors in the integration processes of ML models,

  • Reduce manual labor,

  • Shorten the time to market for ML models,

  • Scale the infrastructure,

  • Organize monitoring and alert mechanisms.

100 million
number of images processed
90%
reduction in deployment time for new models

Solution

1. Minimization of Errors in ML Model Integration Processes:

Templates were developed for generating data processing pipelines. Docker containers were used to isolate models with different dependencies, reducing the risks of conflicts and deployment errors. The pipelines were designed to handle images and document scans and support high-load operations: preprocessing, OCR, and GPU computations.

2. Reduction of Manual Labor through Automation of ETL/ML Processes:

The pipelines are built on Python applications, allowing for flexible integration of both ready-made ML libraries and custom logic. This approach significantly reduced the volume of manual operations in data processing and model handling.

3. Reduction of Time-to-Market:

To shorten the implementation timelines and speed up development, integration with AWS services was implemented, including AWS SageMaker. The use of AWS services enabled dynamic scaling.

Additionally, monitoring was organized to track performance drops and processing errors.

Result

  • Speed of Implementation: Deployment time for new models was reduced from weeks to hours.

  • Resource Savings: Cost optimization for computations through automatic scaling in AWS.

  • Reliability: Fault tolerance when processing millions of images.

Thanks to the developed automated ML pipeline system, we also participated in the development of an ML solution for extracting data from historical handwritten texts.

Challenges

During the project implementation, the team successfully addressed issues related to the increased complexity of integrating custom ML models in the absence of widely accepted standards.

Technologies

  • AWS (SQS, SNS, EC2, Lambda, S3, SageMaker, ASG, etc.)

  • Python

  • Terraform

  • Jenkins

  • Docker

  • Harness

  • BentoML

Other cases
Warehouse Management System (WMS) Audit in 10 Days
Mobile App for Yugoria Insurance Company
Magnit Delivery: IT System Quality Assurance
Designing a Mobile App for ViewEvo
Warehouse Management System (WMS) Audit in 10 Days
Mobile App for Yugoria Insurance Company
Magnit Delivery: IT System Quality Assurance
Designing a Mobile App for ViewEvo
Send us your request
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Tell us your idea
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Оставьте свои контакты
SimbirSoft регулярно расширяет штат сотрудников.
Отправьте контакты, чтобы обсудить условия сотрудничества.
Прикрепить резюме, до 10 Мб
Файл выбран
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Написать нам
Please tell us about the tasks currently on your project.
We will offer expert consultation, recommend qualified specialists, and provide an overview of outstaffing rates.
Field of Expertise
Number of Specialists
Middle
TeamLead
Senior
TechLead
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB
Express сonsultation
Please fill out all fields in the form.
An expert will contact you within the working day.
Field of Expertise
Attach a file (up to 10MB)
File selected
Можно прикрепить один файл в формате: txt, doc, docx, odt, xls, xlsx, pdf, jpg, jpeg, png.

Размер файла до 10 Мб.
Порекомендуйте друга — получите вознаграждение!
Прикрепить резюме, до 10Мб
Файл выбран
Можно прикрепить один файл в формате: txt, doc, docx, odt, xls, xlsx, pdf, jpg, jpeg, png.

Размер файла до 10 Мб.
Заказать демонстрацию
Оставьте контакты, чтобы обсудить проект и условия
сотрудничества, или позвоните: 8 800 200-99-24
Attach a file (up to 10MB)
File selected
Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB