Creation of an IT Data Processing System for the Genealogical Sector

Design, development, and support of an internal automated system for creating configurable and production-ready data processing pipelines with integrated machine learning models. The solution ensures scalability, fault tolerance, and management of the complete data lifecycle — from ingestion and preprocessing to inference and monitoring.

Customer

The client is a company specializing in genealogical research, processing, and providing access to historical archival data.

Task

The SimbirSoft team faced the following challenges:

Minimize errors in the integration processes of ML models,
Reduce manual labor,
Shorten the time to market for ML models,
Scale the infrastructure,
Organize monitoring and alert mechanisms.

100 million

number of images processed

90%

reduction in deployment time for new models

Solution

1. Minimization of Errors in ML Model Integration Processes:

Templates were developed for generating data processing pipelines. Docker containers were used to isolate models with different dependencies, reducing the risks of conflicts and deployment errors. The pipelines were designed to handle images and document scans and support high-load operations: preprocessing, OCR, and GPU computations.

2. Reduction of Manual Labor through Automation of ETL/ML Processes:

The pipelines are built on Python applications, allowing for flexible integration of both ready-made ML libraries and custom logic. This approach significantly reduced the volume of manual operations in data processing and model handling.

3. Reduction of Time-to-Market:

To shorten the implementation timelines and speed up development, integration with AWS services was implemented, including AWS SageMaker. The use of AWS services enabled dynamic scaling.

Additionally, monitoring was organized to track performance drops and processing errors.

Result

Speed of Implementation: Deployment time for new models was reduced from weeks to hours.
Resource Savings: Cost optimization for computations through automatic scaling in AWS.
Reliability: Fault tolerance when processing millions of images.

Thanks to the developed automated ML pipeline system, we also participated in the development of an ML solution for extracting data from historical handwritten texts.

Challenges

During the project implementation, the team successfully addressed issues related to the increased complexity of integrating custom ML models in the absence of widely accepted standards.

Technologies

AWS (SQS, SNS, EC2, Lambda, S3, SageMaker, ASG, etc.)
Python
Terraform
Jenkins
Docker
Harness
BentoML

Other cases

Supporting HeadHunter

Warehouse Management System (WMS) Audit in 10 Days

Nanimatel: a Marketplace for Freelancers

Tochka

AlfaStrakhovanie

Mobile App for Yugoria Insurance Company

Magnit Delivery: IT System Quality Assurance

Designing a Mobile App for ViewEvo

Supporting HeadHunter

Warehouse Management System (WMS) Audit in 10 Days

Nanimatel: a Marketplace for Freelancers

Tochka

AlfaStrakhovanie

Mobile App for Yugoria Insurance Company

Magnit Delivery: IT System Quality Assurance

Designing a Mobile App for ViewEvo

Send us your request

Name or Organization

Phone or Email

Tell us about the project

Attach a file (up to 10MB)

File selected

Required extensions: .txt, .doc, .docx, .odt, .xls, .xlsx, .pdf, .jpg, .jpeg, .png

Maximum file size: 10 MB

I hereby confirm my consent to the processing of my personal data in accordance with Personal data protection and processing policy, SimbirSoft JSC

Projects

About SimbirSoft

Our History

Services

Our Workflow

Locations

Insights

Projects

Our Workflow

Services

Our History

Insights

Locations

About SimbirSoft

Submit a request for services to

request@simbirsoft.com

Partnership inquiries and proposals

info@simbirsoft.com

IT Outsourcing Discovery Phase Turnkey Products IT Development Mobile App Development Artificial intelligence Business and System Analysis UX/UI-Design Testing and Quality Assurance (QA) Technical Support under SLA Jira Service Website and Corporate Portal Development IT Architecture Rescuing Product Upgrading Software IT outstaffing specialists IT Consulting RPA development SDET (Development in testing) Backend Development Services QA Consulting UX Audit Java JavaScript C#/.NET 1С Bitrix PHP QA iOS Android Go Python SDET

Policy on Protection and Processing of Personal Data Policy on Working with Contractors Cost of Providing Usage Rights for the Software Linkory Cost of Providing Usage Rights for the Software Insurance Simbirsoft Platform

JSC "SimbirSoft" carries out the following types of activities in accordance with the list approved by the Order of the Ministry of Digital Development dated May 11, 2023, No. 449: code 1.01, code 2.01. The cost of services is a commercial secret of JSC "SimbirSoft" and is determined individually upon request.

JSC "SimbirSoft" holds exclusive rights to the following software:

Linkory. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Insurance Simbirsoft Platform. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Cash-meter. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Shop Chat Simbirsoft Platform. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Software for "Places", "Events", "Recommendation System", "Feed and Wall", and "Settings" for Russian Place. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.

Participants in the Register of Russian Software:

Linkory. Registry entry in the unified register of Russian software for electronic computing machines and databases No. 17988. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Insurance Simbirsoft Platform. Registry entry in the unified register of Russian software for electronic computing machines and databases No. 12889 Usage rights are granted under a simple (non-exclusive) license according to the license agreement.

JSC "SimbirSoft" uses the following programming languages, software, and/or sets of rules and tools that are used for software development or process building in programming: Python, JavaScript (Node.js), Java, C#, PHP, Go, Ruby, JavaScript, TypeScript, CSS, HTML, Dart (Flutter), JavaScript (React Native), Kotlin (Android), Swift (iOS), C/C++, Embedded 1C Language, R, SQL, database-specific languages (PL/pgSQL for PostgreSQL), NoSQL queries.

Projects

Our Workflow

Services

Our History

Insights

Locations

About SimbirSoft

Submit a request for services to

request@simbirsoft.com

Partnership inquiries and proposals

info@simbirsoft.com

JSC "SimbirSoft" holds exclusive rights to the following software:

Linkory. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Insurance Simbirsoft Platform. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Cash-meter. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Shop Chat Simbirsoft Platform. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Software for "Places", "Events", "Recommendation System", "Feed and Wall", and "Settings" for Russian Place. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.

Participants in the Register of Russian Software:

Linkory. Registry entry in the unified register of Russian software for electronic computing machines and databases No. 17988. Usage rights are granted under a simple (non-exclusive) license according to the license agreement.
Insurance Simbirsoft Platform. Registry entry in the unified register of Russian software for electronic computing machines and databases No. 12889 Usage rights are granted under a simple (non-exclusive) license according to the license agreement.