The system for gathering data from classifieds sites

The system for gathering data from classifieds sites

  • Java
  • Python
  • QAA
  • Retail

The program for data gathering from classifieds sites. Robots-crawlers imitate the actions of the web-site user and collect required information. In addition to text data, robots also recognize information from images: addresses, telephone numbers.

We have implemented auto tests to check the functionality of the sites. Collecting information on one resource takes 3-6 days. Therefore, before running the tests, you need to check whether the functionality or location of the blocks has changed so that the robots didn’t get "lost".

Project in figures

  • 10 robots developed
  • 1 000 000 records per day
  • 7 months of development
  • 90% recognition of image data

technologies

Development: Scrapy, Spark, Scala, Java, Python, Tesseract
Testing Tools: XPath, Selenium, PyTest, JSON, request

Let our experience help you achieve your business goals

Contact us