Skip to main content

Command Palette

Search for a command to run...

Building a YouTube Video Summarizer with Gemini and Python

Updated
5 min read
Building a YouTube Video Summarizer with Gemini and Python
V

Highly skilled Data Test Automation professional with over 10 years of experience in data quality assurance and software testing. Proven ability to design, execute, and automate testing across the entire SDLC (Software Development Life Cycle) utilizing Agile and Waterfall methodologies. Expertise in End-to-End DWBI project testing and experience working in GCP, AWS, and Azure cloud environments. Proficient in SQL and Python scripting for data test automation.

In today's fast-paced digital world, we're constantly bombarded with video content. While YouTube has become an incredible resource for learning and entertainment, finding time to watch lengthy videos can be challenging. What if you could get the key points from any YouTube video in just seconds?

In this blog post, I'll walk you through how I built a simple yet powerful YouTube video summarizer using Google's Gemini AI. This project showcases how modern AI tools can be leveraged to solve practical, everyday problems.


Step 1: Setting Up the Environment

Before we start coding, we need to set up our environment. Here’s what you’ll need:

  • Python installed on your machine.

  • A Google Cloud API key for Gemini AI (you can get this from the Google AI studio).

  • Install the required Python libraries.

Run the following commands to install the necessary libraries:

pip install streamlit python-dotenv google-generativeai youtube_transcript_api

Step 2: Writing the Code

Here’s the complete code for our Recipe Builder app:

from dotenv import load_dotenv
load_dotenv()
import streamlit as st
import os
import google.generativeai as genai
from youtube_transcript_api import YouTubeTranscriptApi

flag=0
genai.configure(api_key=os.getenv("GENAI_API_KEY"))

prompt_text="""You are a YouTube video summarizer. Your task is to analyze the transcript of a video and create a concise summary, highlighting the key points within a 250-word limit. 
Please provide the important details from the text provided.  """

#preprare transcript from video
def transcript_extractor(video_string):
    video_id=video_string.split("=")[1]
    transcript_text=YouTubeTranscriptApi.get_transcript(video_id)
    transcript = ""
    for i in transcript_text:
        transcript += " " + i["text"]
    return transcript

#gemini call
def get_gemini_response(prompt_text,transcript_text):
    model = genai.GenerativeModel('gemini-2.0-flash')
    response = model.generate_content(prompt_text+transcript_text)
    return response.text

##initialize streamlit app
st.set_page_config(page_title="Youtube transcriber")
st.header("Youtube transcriber")
youtube_link=st.text_input("Enter Youtube video URL: ",key="input")
if youtube_link:
    video_id = youtube_link.split("=")[1]
    st.image(f"http://img.youtube.com/vi/{video_id}/0.jpg", use_container_width=True)
    flag=1
submit=st.button("Prepare summary of the video")

## Button click
if submit:
    if flag==1:
        transcript_text=transcript_extractor(youtube_link)
        response=get_gemini_response(prompt_text,transcript_text)
        st.subheader("Video summary:")
        st.write(response)
    else:
        st.warning("Please enter a YouTube video URL.")

Github:

https://github.com/vipinputhanveetil/gemini_youtube_transcriber


Step 3: Breaking Down the Code

1. Import

from dotenv import load_dotenv
load_dotenv()
import streamlit as st
import os
import google.generativeai as genai
from youtube_transcript_api import YouTubeTranscriptApi

flag=0
genai.configure(api_key=os.getenv("GENAI_API_KEY"))

I'm using environment variables to securely store the Gemini API key, which is a good practice for any application using API credentials.

2. Extracting the Transcript

#preprare transcript from video
def transcript_extractor(video_string):
    video_id=video_string.split("=")[1]
    transcript_text=YouTubeTranscriptApi.get_transcript(video_id)
    transcript = ""
    for i in transcript_text:
        transcript += " " + i["text"]
    return transcript

This function extracts the video ID from a YouTube URL and then uses the YouTube Transcript API to fetch and concatenate the transcript text.

3. Generating the Summary with Gemini AI

#gemini call
def get_gemini_response(prompt_text,transcript_text):
    model = genai.GenerativeModel('gemini-2.0-flash')
    response = model.generate_content(prompt_text+transcript_text)
    return response.text

Here, I'm using Gemini 2.0 Flash model, which is optimized for quick responses while maintaining high quality. The AI is given a specific prompt to create a concise summary:

prompt_text="""You are a YouTube video summarizer. Your task is to analyze the transcript of a video and create a concise summary, highlighting the key points within a 250-word limit. 
Please provide the important details from the text provided.  """

4. Building the User Interface with Streamlit

##initialize streamlit app
st.set_page_config(page_title="Youtube transcriber")
st.header("Youtube transcriber")
youtube_link=st.text_input("Enter Youtube video URL: ",key="input")
if youtube_link:
    video_id = youtube_link.split("=")[1]
    st.image(f"http://img.youtube.com/vi/{video_id}/0.jpg", use_container_width=True)
    flag=1
submit=st.button("Prepare summary of the video")

## Button click
if submit:
    if flag==1:
        transcript_text=transcript_extractor(youtube_link)
        response=get_gemini_response(prompt_text,transcript_text)
        st.subheader("Video Summary:")
        st.write(response)
    else:
        st.warning("Please enter a YouTube video URL.")

Step 4: Running the App

To run the app, save the code in a file (e.g., gemini_youtube_transcriber.py) and run the following command in your terminal:

streamlit run gemini_youtube_transcriber.py

This will start the Streamlit app, and you can access it in your browser at http://localhost:8510


Step 5: Testing the App

  1. Copy and paste any youtube URL into the input box ant press the tab button.

  2. App displays the video thumbnail for confirmation.

  3. When they click the "Prepare summary" button, the app processes the transcript and displays the AI-generated summary.

Benefits and Use Cases

This YouTube summarizer offers several benefits:

  • Time Saving: Get the key points of a video without watching the entire thing

  • Content Filtering: Quickly determine if a video contains the information you need

  • Study Aid: Create summaries of educational videos for review

  • Accessibility: Make video content more accessible to those who prefer reading

Future Improvements

While the current version works well, there are several enhancements I plan to implement:

  1. Better URL Parsing: Support various YouTube URL formats (shortened links, timestamps, etc.)

  2. Multi-language Support: Add options for summarizing videos in different languages

  3. Customizable Summary Length: Allow users to specify how detailed they want the summary to be

  4. Timestamp Linking: Include timestamps in the summary that link to the relevant parts of the video

  5. Error Handling: More robust handling of videos without available transcripts

Conclusion

This project demonstrates how AI tools like Google Gemini can be combined with existing APIs to create practical applications that solve real-world problems. The YouTube summarizer is just one example of how AI can help us consume information more efficiently in our content-rich digital landscape.

By leveraging the power of large language models and complementary tools, even relatively simple applications can provide significant value. The code for this project is straightforward, yet the resulting tool can save hours of time for students, researchers, professionals, and casual YouTube viewers alike.