Development of a Data Processing IT System for an International Company
SimbirSoft has been collaborating with the client for over 5 years, successfully delivering projects leveraging Artificial Intelligence technologies and our strong backend development expertise.
Customer
The client is a company specializing in genealogical research, processing, and providing access to historical archival data.
Task
The initial project involved developing a genealogy research service using Machine Learning and Data Science expertise.
Machine Learning-based solution: from family tree creation to photo restoration
We then focused on the following tasks:
-
Deployment and testing of individual ML pipelines
-
Creation of a flexible chain of ML pipelines
-
Maintenance and enhancement of existing ML pipelines
-
Supporting Data Science specialists in model debugging
-
Cost analysis and optimization
-
Development of an internal web service to simplify debugging of DS models and model chains
Solution
-
ML processing of a large volume of handwritten book pages
-
Reduced Time-to-Market for deploying machine learning models into production
-
Maintenance and modification of existing ML pipelines
-
Efficient use of computational resources through automatic infrastructure scaling
-
Support for Data Science specialists in troubleshooting and resolving issues
Project Phases
Phase 1
Deployment and debugging of a group of ML pipelines with various models, integrating them into a unified processing chain. An example of the implemented pipeline chain is shown in the diagram below.
Phase 2
Support of pipelines during testing and real data processing, as well as validation of new hypotheses proposed by Data Science specialists.
Phase 3
Project evolution, testing of different models, and modification of the number of ML containers within the pipeline chain.
Additionally, input and output data validation mechanisms were implemented.
An internal web service was also developed to facilitate debugging of DS models and model chains.
Results
- Faster Deployment. Reduced model deployment time from weeks to hours.
- Resource Optimization. Lower computational costs through automatic scaling in AWS.
- Reliability. High fault tolerance while processing millions of images.
Technologies
-
AWS SageMaker
-
EC2
-
S3
-
SNS
-
SQS
-
Firehose
-
Terraform
-
Harness
-
New Relic
-
Jenkins
-
FastAPI
-
Angular