Development of a System of Five Neural Networks for TV Broadcast Analysis

The goal

Our client is a developer of digital tools for marketers. This time, our task was to create a service for TV broadcast analysis. The goal of this analysis is to fine-tune advertising for TV viewers based on the content they watch.

Timeline

6 months

Year

2024

Technologies

What We Decided to Analyze and Which Neural Networks We Used

For video content analysis, we integrated five neural networks into the project:

The first YOLOv8 detects logos.
The second YOLOv8 identifies common objects (umbrella, ball, person, sneakers, dog, etc.).
Rev AI transcribes speech and sends the text to ChatGPT.
ChatGPT extracts mentions of brands, cities, and celebrities from the transcript and determines sentiment (positive/negative attitude of the speaker toward an entity).
Tesseract recognizes static text within the video frame, such as subtitles, captions, and text-based logos.

How the Project Works: A Look at Microservices

TV broadcast analysis needs to be fast, making performance a major challenge. To ensure smooth operation, we implemented a microservice architecture.

The product consists of multiple services, each handling a specific task. This approach makes the project highly scalable, allowing the owner to add new services to enhance performance and increase network capacity as needed.

Where We Got the Datasets for Logo Detection

To train YOLOv8 to recognize standard objects like people, cars, animals, and clothing, we used the COCO dataset without making any modifications.

Logo detection was more challenging, but we ultimately chose the open-source OpenLogo dataset, which contains data on 352 logos and 27,000 labeled images.

Augmentation and Balancing

After discussions with the client, we decided to add 40 more logos to our dataset. To train the neural network to detect each logo, we gathered at least 50 images per logo and augmented them threefold.

We also had to adjust the original OpenLogo dataset. The number of images per logo was uneven—some logos had only 20 images, while others had 150.

To fix this, we balanced the dataset. Before training, we identified logos with fewer images than others. During training, we penalized the model more heavily when it missed logos with fewer labeled images.

Project Status and Next Steps

Currently, our service analyzes video files uploaded by users. It takes 5 minutes to process a 10-minute video. In the near future, we plan to integrate WebSocket support and expand the gateway to enable live TV broadcast analysis.

Next, we will integrate the project's backend as an API into the client’s digital products. This will allow European marketers to use our service to launch ads at the most relevant moments.

For example, if the main character in an action movie is wearing Nike sneakers, why not reinforce the impact with a Nike commercial right after the film?

Project team

Danila Skablov

Head of AI Projects

Daniil Semenov

Project manager

Yuri Umnov

ML engineer

Ivan Petrov

Backend developer

Other projects

Paynet Crypto

Crypto Exchange for a Payment System Processing 2 Million Transactions per Day

AI Chatbot for Employee Support

Automation of Technical Support for ATM Repair Experts

Ready to discuss your project?

Our contacts

Fill out the form to the right or email

Fill out the form to the bottom or email

Email: business@unistory.org Telegram: unistoryapp

We'll get back to you shortly!