Quantcast
Channel: Machine Learning | Towards AI
Viewing all articles
Browse latest Browse all 792

AI-Generated Animations Are Here (Almost…)

$
0
0
Author(s): Shahzeb Naveed Originally published on Towards AI. AI bot sliding on a sine curve. (Source: Author/Adobe Firefly) Last week, as I scrolled through my Instagram feed, an animation video popped up in the famous 3Blue1Brown style explaining how Convolution Neural Networks work. It suddenly occurred to me if we can automate this entire process of animation, from concept to visualization down to Manim-based Python script that can generate animation videos automatically. With a quick Google search, I found that the author of 3Blue1Brown had already open-sourced his Python-based engine Manim, which creates programmatic animations. At this point, everything felt sorted out. Setting Up I installed the Manim on my Mac following the conda method. Instead of installing the full LaTex distribution, I installed the BasicTex since I couldn’t afford a 4 GB installation on my already jam-packed Mac. Additionally, I had to install some missing modules like standalone and preview as follows: sudo tlmgr install standalonesudo tlmgr install preview Showtime! Now, since ChatGPT has been my go-to AI for routine tasks, I went straight to it with an initial prompt (an awfully vague one though): “I need to create an animation explanation video using Manim python library similar to 1 Blue 3 Brown. First, generate a 2-minute transcript explaining the transformers architecture at a high-level. Then for that trasncript, create a python program using manim package to create a scene or multiple scenes to create animation visualizing the transcript.” It generated the following Manim-based Python script: # Import necessary modules from Manimfrom manim import *class EmbeddingScene(Scene): def construct(self): # Create a matrix representing the embedding matrix embedding_matrix = Matrix([[1, 0.5, 0], [0.2, 0.8, 0.3], [0, 0.4, 1]]) embedding_matrix.scale(0.8) self.play(Create(embedding_matrix)) # Create arrows to represent the transformation process arrow_query = Arrow(embedding_matrix.get_center(), embedding_matrix.get_center() + RIGHT) arrow_key = Arrow(embedding_matrix.get_center(), embedding_matrix.get_center() + UP) arrow_value = Arrow(embedding_matrix.get_center(), embedding_matrix.get_center() + DOWN) self.play(Create(arrow_query), Create(arrow_key), Create(arrow_value)) # Add labels for Query, Key, and Value label_query = Text("Query").next_to(arrow_query, RIGHT) label_key = Text("Key").next_to(arrow_key, UP) label_value = Text("Value").next_to(arrow_value, DOWN) self.play(Create(label_query), Create(label_key), Create(label_value)) # Show the attention mechanism (optional) attention_arrow = Arrow(arrow_query.get_end(), arrow_key.get_start()) self.play(Create(attention_arrow)) # Show the final embedding final_embedding = Circle(color=BLUE).next_to(embedding_matrix, RIGHT) self.play(TransformFromCopy(embedding_matrix, final_embedding)) # Fade out everything self.play(FadeOut(embedding_matrix), FadeOut(arrow_query), FadeOut(arrow_key), FadeOut(arrow_value), FadeOut(label_query), FadeOut(label_key), FadeOut(label_value), FadeOut(attention_arrow)) self.wait(1) # Pause for a moment Then, I issued the following command: manim -p transformers.py that rendered the video and opened the video file after completion: AI-Assisted Manim animation trying to explain Transformers. (Source: Author) What a bummer! As you can see, the result was as awful as my prompt as it lacked any logical visualization that made sense, with random arrows pointing towards random directions, incorrect concepts, an empty circle that represents nothing, and no sense of story-telling whatsoever). This served as the first reality check and pushed me to do better at Feature Engineering. Over repeated failed experimentation, I arrived at the following prompt (with more explicit concepts, a simpler problem, and clearer instructions). Asking the AI to first come up with a “plan of action” yielded significant improvements. “Need to create an animation for an explanation video to explain how curve fitting works in ML. # Visualization Concept 1. A set of dots roughly scattered like a sine curve on a graph (not a pure sine curve but with some noise) 2. then another solid curve (depicting the curve defined by an ML model) appears. It starts as a straight line but then as the training progresses, the straight line fits into the set of dots and ultimately transforms into a sine curve. 3. Then create similar scenes for 2 additional curves # Instructions 1. Be creative to add animations as needed and as possible in the Manim python library. 2. First, create a proper plan of action to think what exactly will you be visualizing (graphic elements, formulas, etc.) to explain what concepts. 3. Then translate that action plan to a python script. 4. Do not add any full sentences on the screen. You can however use labels as text if needed. 5. Generate code for all scenes. 6. Code should be complete with all modules imported, all variables defined. It should run as is.” AI explains the concept of curve fitting in ML (Source: Author) This seemed pretty neat to me, although, in the second half of the video, the text overlaps with the graph, the arrow points to nothing, and so does the circle. In another task, I asked the LLM to animate a hypothetical feature matrix with a column matrix representing labels, along with a rectangle focusing each of the columns one by one. The results were pretty neat but this was only after hours of prompt tuning, manual debugging, and going as explicit as possible. AI visualizes a feature matrix. (Source: Author) Issues: Deprecated Code: Manim appears to be like a rapidly evolving library with many methods/attributes deprecated or renamed. Secondly, the public codebase using the Manim package seems to be limited. These factors led to the LLMs generating code that didn’t run on the first go and therefore, made the debugging process extremely cumbersome. ChatGPT 3.5: Since I initially used ChatGPT 3.5, I realized it wasn’t as updated and lacked the latest changes in the Manim package. Therefore, I decided to give its competitors a try. Google Gemini: I’ve never been a fan of Google’s Gemini and this experiment made my opinion even stronger. On several occasions, it either generates “templates” and asks the user to fill in the details despite explicitly asking it to suggest complete, ready-to-run code. Plus, it generates unnecessary explanations of the code post-generation. I eventually dumped Gemini but did use it to enhance my prompt and understanding of what a good “plan of action” might be. Anthropic’s Claude: I then tried Anthropic’s Claude for the first time. And surprisingly, it gave a decent result on the first go. However, on many occasions, it still suggested code with deprecated Manim functionality even though it was trained more recently. GPT4: I also tried GPT-4 Turbo via the OpenAI API (and also […]

Viewing all articles
Browse latest Browse all 792

Trending Articles