BVKS — A video hosting platform with automatic transcription of videos into text

The goal

The goal of this project is to help followers of the missionary Bhakti Vikasa Swami gain access to his lectures in various formats. Develop a platform for viewing videos, automate the transcription of lectures, and implement word search within all videos.

Timeline

2 months

Year

2024

Technologies

Customer and His Platform

Bhakti Vikasa Swami is one of the leading gurus of the organization, personally trained by Prabhupada, the founder of the faith. The preacher constantly delivers lectures in various countries and actively manages his YouTube channel: currently, it has over 120 thousand subscribers, with more than 2500 videos uploaded.

Two years ago, we developed a separate website for Bhakti and his lectures, a mini-YouTube for followers and disciples. We developed everything on Firebase, a large Google database that allows web services and applications to work without a backend. Lectures in video and audio formats are published on YouTube and the proprietary platform in large numbers.

Automating Video Transcription

Two years after the platform development, the customer returned to us with a new idea: to place transcriptions of video lectures on the website. It turned out that a significant portion of Bhakti's audience prefers the textual format. However, manually transcribing hundreds and thousands of lectures is an arduous task. Our task was to automate this process.

How to do it? If the phrase "neural networks" crossed your mind, congratulations, you are correct. Further in the case study, we explain in detail how we automated the translation from video to text, the nuances involved, and why our technology is superior to any transcription service.

Integrating with Elasticsearch

Another idea from the customer was to help users search for information more precisely on the platform. A typical use case: a follower visits the YouTube channel to see what his spiritual teacher thinks about relationships in a couple. The search results yield videos, but not all of them are relevant to the query: some are about relationships with the guru or friends, while others are about the relationship with God.

Another issue: even if the user finds the desired video, it may last two or three hours and contain many thoughts on various topics. Together with the customer, we decided to help Bhakti's followers find answers to their questions.

Whisper AI and ChatGPT

To transcribe videos, we decided to use the specialized neural network Whisper AI. Artificial intelligence copes well with the transcription task, but the resulting text is usually not presentable enough. The material still requires manual processing, and in our case, due to the huge number of videos, this was not feasible. To create a quality draft of lectures manually, it would have been necessary to employ several dozen employees for a month.

To process the text after transcription, we implemented an algorithm that runs the transcription through ChatGPT. The result is a higher quality transcription of the lecture, stylistically coherent and error-free.

The script processed lectures for several months. Yes, it's long — but a thousand times faster and cheaper than doing it manually.

Moderation Feature

When processing text through ChatGPT, there is still a probability of errors, both stylistically and factually. We decided to give users the ability to point out these errors. Visitors to the platform can report any errors they find to the administrator, who then corrects the text or rejects the report. We are currently finalizing the technical implementation of this feature.

Search Within Videos

Having transformed audio and video into text, we faced another challenge — assisting users in finding specific words within lectures. As a solution, we opted for Elasticsearch — a tool capable of searching data within vast datasets.

Since Elasticsearch cannot search for words within audio or video directly, the search on the platform operates based on the transcriptions of lectures, which we automated in the previous stage of our work. Each transcription is linked to its corresponding video/audio version, allowing Elasticsearch to determine how many times the user's desired word was mentioned in a particular lecture.

Within the UI, we divided the search into two options: standard search and Deepsearch, which enables searching within lectures. Users can choose whether to search for a lecture by its title or by the words mentioned within it.

Outcome and further plans

We successfully tackled both tasks for the client, relying on AI-generated text transcriptions. Users of the platform now have the ability to read the preacher's lectures and search for desired videos based on their content, rather than just their titles.

In the near future, we plan to enhance the Deepsearch feature by displaying precise timestamps when Bhakti spoke the searched word in the video. These timestamps will also be pulled from the textual version.

Project team

Danila Skablov

Head of AI Projects

Ilya Sokolov

Project manager

Ivan Petrov

Backend developer

Yan Bortsov

Backend developer

Ilya Vylegzhanin

Frontend developer

Rostislav Petrov

Other projects

Paynet Crypto

Crypto Exchange for a Payment System Processing 2 Million Transactions per Day

AI Chatbot for Employee Support

Automation of Technical Support for ATM Repair Experts

Ready to discuss your project?

Our contacts

Fill out the form to the right or email

Fill out the form to the bottom or email

Email: business@unistory.org Telegram: unistoryapp

We'll get back to you shortly!