Automatic generation of comic videos by GPT-4

06.04.2023 admin Leave a comment

A multi-modal telegram bot I recently made was a resounding success 😊 I was surprised how many people took advantage of it and forked/liked it on github. But I wanted something more.

I decided to create a service where people can create their own comics, fairy tales, and indeed any stories. Preferably with the push of a button.

My idea was to create a program that could generate stories based on a small number of parameters. It was the language, the seed for generating text, the visual setting, and so on. I knew that for this I needed to use GPT-4, some kind of API for pictures, a translator, and a speech synthesizer. After a quick check, it turned out that all this is available and not so expensive!

The following picture popped into my head:

Some technical points will be described below.

Images

I decided to use the good old Stable Diffusion, because it is cheap (even open source, but I use the API) and draws pretty well, but MidJourney is still closed.

I generate an image corresponding to the description of each step of the scene. In addition, I added various visual styles and settings to make the images more appealing and relevant to the context of the scene. For example, I used image styling in the style of Star Wars, Disney, Marvel, etc. All this is at the user’s choice.

As a result, I get a set of images in the same style, which are ready for video generation.

Recently, in one community, an almost brilliant idea was thrown – not to create pictures, but to google them on Google Pictures. It’s free, fast, and even better in some cases, like news. I will definitely implement.

Sound

When I first started working on the project, I ran into a problem – how to make it so that users can not only read, but also listen to the created stories?

And then the idea came to my mind to voice pieces of text through Google Text-to-Speech. It allows you to create realistic voice accompaniment in different languages and with different voices.

You just need to break the text generated by GPT-4 into paragraphs and send each paragraph for voiceover. Thus, users can read the story and listen to the voiced version of it at the same time. This makes the reading process more interesting and fun, and also helps people who prefer to listen to text rather than read it.

Video

The most difficult part was building the story through videoshow.js. Quite a lot of time was spent on debugging all this. And here, for example, one of the resulting stories:

Globalization

The story generator is not tied to a language, it is completely global. In fact, in any language from the Google Text-2-Speech list .

So my plans include launching the US market, ProductHunt, Y Combinator and all that 😏 I would be glad for any support in this direction.

Features

Share from Video: We have added a new feature that allows users to share videos directly from the video player. This feature makes it easier for users to share their content with their audience and promote their work.
Search UI: We have redesigned the search interface on our platform, making it more intuitive and user-friendly. This update improves the overall user experience and helps users to find the content they are looking for more quickly.
Order by Best 50 Newest 50 Interlaced: We have added a new sorting option that allows users to order their videos by the best 50, newest 50, or interlaced. This feature provides users with greater flexibility and control over how their content is organized.
Fade Out 2 Seconds Before End Black Screen: We have introduced a new feature that fades out the video 2 seconds before the end and displays a black screen. This helps to create a more polished and professional finish to videos.
Temperature Selection: We have added a new feature that allows users to adjust the temperature of their videos. This feature enables users to create unique visual effects and enhance the overall look and feel of their content.
PWA + Apple Icon: We have added a progressive web app (PWA) to our platform, making it easier for users to access our services from their mobile devices. Additionally, we have added an Apple icon to improve the user experience on Apple devices.
Show in Gallery First Stories in Browser Language, Filter by Language: We have added a new feature that shows the first stories in the gallery based on the user’s browser language. Additionally, we are exploring the possibility of adding a language filter to help users find content in their preferred language more easily.
Android App: We have launched a new Android app for our platform, making it easier for users to create and upload videos from their mobile devices. This update improves the overall user experience and makes our platform more accessible to a wider audience. https://play.google.com/store/apps/details?id=shop.mangatv.twa&hl=en_US
Show Subscription Date on Gold Plan: We have added a new feature that displays the subscription date for users on the Gold plan. This update provides users with more information about their subscription and helps them to manage their account more effectively.
Vertical Videos (YouTube Shorts): We have added support for vertical videos (YouTube Shorts) to our platform, allowing users to create content for this popular format. This update enables our users to reach a wider audience and stay up-to-date with the latest trends in video creation.

Philosophical questions

Finally, the use of AI-generated content raises several philosophical questions. For example, what is the human role in creating and using such content? What are the ethical issues associated with using artificial intelligence to create content that can mimic the human mind and behavior? What is the future of AI-generated content creation and use, and how will this affect our culture and society as a whole? These questions require serious discussion and reflection so that we can make the most of the potential of artificial intelligence in our world.

But I decided to do it first, and then think about it 😊

Will the automatic content be of sufficient quality?

Today, there are algorithms that are able to create sufficiently high-quality texts, sound, and images. However, so far they cannot replace human creativity and create something completely new and original.

The story editing feature can help make the content better and more interesting. Editing allows you to improve and refine individual slides, correct errors, add new elements, and finally place emphasis. In addition, the editor can always make a creative contribution.

What do you think? Is the project interesting? Would you use? What monetization methods do you recommend?

You can see the project at this link

Or use this Android App: Manga TV

tagged with Manga TV