BVKS — A video hosting platform with automatic transcription of videos into text

BVKS

The goal

The goal of this project is to help followers of the missionary Bhakti Vikasa Swami gain access to his lectures in various formats. Develop a platform for viewing videos, automate the transcription of lectures, and implement word search within all videos.

Timeline

2 months

Year

2024

Technologies

Customer and His Platform

Bhakti Vikasa Swami is one of the leading gurus of the organization, personally trained by Prabhupada, the founder of the faith. The preacher constantly delivers lectures in various countries and actively manages his YouTube channel: currently, it has over 120 thousand subscribers, with more than 2500 videos uploaded.

Two years ago, we developed a separate website for Bhakti and his lectures, a mini-YouTube for followers and disciples. We developed everything on Firebase, a large Google database that allows web services and applications to work without a backend. Lectures in video and audio formats are published on YouTube and the proprietary platform in large numbers.

Case banner (mobile version)

Automating Video Transcription

Two years after the platform development, the customer returned to us with a new idea: to place transcriptions of video lectures on the website. It turned out that a significant portion of Bhakti's audience prefers the textual format. However, manually transcribing hundreds and thousands of lectures is an arduous task. Our task was to automate this process.

How to do it? If the phrase "neural networks" crossed your mind, congratulations, you are correct. Further in the case study, we explain in detail how we automated the translation from video to text, the nuances involved, and why our technology is superior to any transcription service.

Case banner (mobile version)

Integrating with Elasticsearch

Another idea from the customer was to help users search for information more precisely on the platform. A typical use case: a follower visits the YouTube channel to see what his spiritual teacher thinks about relationships in a couple. The search results yield videos, but not all of them are relevant to the query: some are about relationships with the guru or friends, while others are about the relationship with God.

Another issue: even if the user finds the desired video, it may last two or three hours and contain many thoughts on various topics. Together with the customer, we decided to help Bhakti's followers find answers to their questions.

Case banner (mobile version)

Whisper AI and ChatGPT

To transcribe videos, we decided to use the specialized neural network Whisper AI. Artificial intelligence copes well with the transcription task, but the resulting text is usually not presentable enough. The material still requires manual processing, and in our case, due to the huge number of videos, this was not feasible. To create a quality draft of lectures manually, it would have been necessary to employ several dozen employees for a month.

To process the text after transcription, we implemented an algorithm that runs the transcription through ChatGPT. The result is a higher quality transcription of the lecture, stylistically coherent and error-free.

The script processed lectures for several months. Yes, it's long — but a thousand times faster and cheaper than doing it manually.

Moderation Feature

When processing text through ChatGPT, there is still a probability of errors, both stylistically and factually. We decided to give users the ability to point out these errors. Visitors to the platform can report any errors they find to the administrator, who then corrects the text or rejects the report. We are currently finalizing the technical implementation of this feature.

Case banner (mobile version)

Search Within Videos

Having transformed audio and video into text, we faced another challenge — assisting users in finding specific words within lectures. As a solution, we opted for Elasticsearch — a tool capable of searching data within vast datasets.

Since Elasticsearch cannot search for words within audio or video directly, the search on the platform operates based on the transcriptions of lectures, which we automated in the previous stage of our work. Each transcription is linked to its corresponding video/audio version, allowing Elasticsearch to determine how many times the user's desired word was mentioned in a particular lecture.

Within the UI, we divided the search into two options: standard search and Deepsearch, which enables searching within lectures. Users can choose whether to search for a lecture by its title or by the words mentioned within it.

Outcome and further plans

We successfully tackled both tasks for the client, relying on AI-generated text transcriptions. Users of the platform now have the ability to read the preacher's lectures and search for desired videos based on their content, rather than just their titles.

In the near future, we plan to enhance the Deepsearch feature by displaying precise timestamps when Bhakti spoke the searched word in the video. These timestamps will also be pulled from the textual version.

Project team

Ilya Sokolov

Project manager

Ivan Petrov

Backend developer

Yan Bortsov

Backend developer

Ilya Vylegzhanin

Frontend developer

Rostislav Petrov

QA

Ready to discuss your project?

Our contacts

Fill out the form to the bottom or email

Email: business@unistory.orgTelegram: unistoryapp

We'll get back to you shortly!

By clicking the button, you consent to the processing of personal data and agree to the privacy policy.

Almaty office

st. Rozybakieva 289/1, office 36,
Almaty, Kazakhstan, 050060

Integrating the future


© 2025 Unistory