top of page
Search

The Free Lunch Manifesto | AI for Video

  • Writer: R M
    R M
  • Dec 2, 2025
  • 74 min read

Updated: Dec 3, 2025


Over the past 10 years we have borne witness to AI trudging through the valley of uncanny. With drifting continuity, low-fidelity images and surreal characters that give us wooden performances akin to community theatre.


Not just visual media either, people have become hyper-aware to em-dash riddled ChatGPT outputs, and developers have to increasingly fight against technically flawed and incomprehensible code, laden with bugs and security issues.


That being said it's not like there hasn’t been any improvement and now I think it's safe to say that with a little bit of time and money, creators can now exercise some kind of control over this technology to produce a deliverable that is (at the very least) a minimum viable product. 


Whether creative workers are using AI to produce assets to be used in typical design, animation or video processes or they are generating content out of “whole-cloth” with increasingly byzantine and voodoo-esque AI production pipelines.


People are using this occasionally useful technology; that I believe is still a usually unimpressive hindrance to creative output and quality in increasingly “complex”, “meta” and “esoteric” ways.


My ultimate goal of this article is to make a case for what I think this technology means for the future of creative workers and the industries that rely on them. I will go through a modern AI production pipeline to produce a Cybersecurity Training Video (what I think is a reasonably difficult but not extreme use case).


I will do my best to document the hiccups, setbacks and successes and compare a portion of that video with a more “traditional" video production pipeline as well as a blended CG approach using modern highly optimized software.


I will also show how creative workers are using these tools today in useful and creative ways and what I think is the best use cases for this technology. All methods coming with their associated costs and trade-offs.


Part I: AI advancement is gradual & over-hyped


This article at its core is a criticism of AI but I don’t want to make the same tired arguments that a lot of people critical of AI make. That it's bad for the environment, it violates copyright law, it puts people out of work. All these arguments I am sympathetic to but ultimately aren’t really enough to convince me why this is a bad tool.


I also don’t want to put too much emphasis on the current limitations of AI. Things like character interactions, generation time, compute, resolution, length and quality will more than likely be improved in one way or another. So, harping on very specific instances where AI fails is akin to providing a laundry list of goalposts that will inevitably move anyway. 


Wineglass Benchmark Nano Banana 2025


So we need to judge it as a tool, and as we judge a hammer by how well it drives a nail we need to set criteria for AI instead of AI company’s setting their own benchmarks and giving themselves excellent scores on a test they made up.


I think the best place to start tearing down AI’s self bestowed accolades is with a timeline. A lot of what we know about AI and AI workflows comes from the companies investing billions of dollars into this technology but also come from a separate but equally important type of parasitic “grind-set” influencer. 


The types that every time a new AI tool or service is announced will go to the same well of “X industry” is cooked, “New AI Product” has perfect “X capabilities” 


These influencers allow the AI companies to not directly fear-monger to the masses but instead have these opportunistic grifters make viral content that deliberately stokes peoples' insecurities. 



It seems that every fiscal quarter there is a new announcement that we are all going to be out of work because AI is now finally perfect.


So this is why I think a timeline is important, we need to see where this truly started so we can accurately chart the progress of this technology. This helps us better understand how close we are to an AI driven mass unemployment apocalypse.


Right now the timeline that AI companies want you to place the beginning of their technology is in the Spring of 2022.


This kind of language crops up from time to time, basically to emphasize the speed at which these AI models have improved. Though, when we dig a bit deeper we start to realize that this kind of generative AI (where AI was creating images, audio and text) really started at the latest 10 years ago.


Why is that important? Well first it puts the timeline in a more reasonable area. You may remember Netflix’s AI Generated Movies; where the scripts were written by bots. The first of which a Standup Comedy Show that was released July of 2021.


Further we have content creators like Austin McConnell who let an AI generate an entire script for a short film he released in May of 2019. Even in that video he mentions the long hype train of AI being prognosticated to being closer and closer to humans everyday.


One of the earliest of those prognosticators being CGP Grey with what I consider his Magnum Opus Humans Need Not Apply (2014). Where he predicts a future where every worker will eventually be replaced by machines and artificial intelligence. The Guardian also made a stunning prediction video in 2016 depicting a future without human labor.


Personally when I started to notice Generative AI was July of 2015, with Google’s Deep Dream. The images Deep Dream produced were the first AI images I ever saw and I remembered thinking at the time “is this what computers actually think and see?”


The point I am making is that it is now 2025 and we have been on the carousel of AI hype since 2015 (the founding year of OpenAI). When putting AI’s advancement to the point it is now, on a timeline of a decade (instead of 2-3 years) it makes the progression seem more of what you would expect from a new technology.


The speed isn’t break-neck fast but it isn’t stuck in the mud either; it is a steady pace that we are used to. It puts the trajectory more in-line with the transition from CD Players to MP3 players. 


It's the kind of technological progression we see with a lot of things. Like the Wright Brothers Glider in 1913 to the Fokker T.II in 1923 (10 years.) One barely lifted off the ground the other completed the first nonstop transcontinental flight. So no, AI hasn’t advanced to detailed, hyper-realistic and coherent media overnight; it took almost a decade, and the realism, detail and coherence is still up for debate. 


I am also being generous on this point, I could place the timeline even further back to the invention of things like The Perceptron, but I think that would be unfair. This kind of generative AI is completely unique and the handful of baby steps this tech’s progenitors had in the 60s or even the early 2000s seems really unfair and not useful.


Still, AI as we know it is nearly a decade old and presently it looks like a technology that would have advanced to this point in 10 years. 


Part II: The Improvement of AI

(Your mileage may vary)


We just spent a lot of time at the dawn of AI, but what about its present? I’ll admit it’s unfair to talk about the genesis of AI without talking about its progression in 2022 to its state in 2025 but any recent “improvement” I’ve seen has just been in style, taste and attempts at consistency with varying levels of success. A lot of this tech's perceived progression boils down to just preferring one kind of image over another.


I think 2023 was when AI images started to basically look as good as they were ever going to get and the beginning of 2024 when Sora was first announced (nearly two years ago from this writing) AI video was basically at the quality it is today. 


Reddit user /u/chaindrop, March 23, 2023


Sora, Feb 17, 2024


AI video quality has not improved beyond the level Sora was at in 2024 (at the time of this writing 2026.) Any improvements announced since then are a constant drum beat of character consistency, sound, image swapping and either hands or text not coming out like a garbled mess.


And to their credit about half the time things like that come out relatively legible but it’s really hard to chart the progression because of the deliberately misleading marketing around these “advancements” 


A common form of content around this would be showing extremely early AI generated images compared against extremely curated results from 2025. 


Is there anything meaningfully different between Nov, 2022 and 2025? Midjourney


This is particularly misleading when it comes to AI video generation because high-quality and realistic AI avatar videos have been produced as early as 2017 with services like synthesia.io. I think we have all seen this style of AI video in one corporate training or another.


Synthesia, Mar 17, 2021


I think the best way to illustrate this as well as understand this technology is to actually use it. So a bit about me, I have about 15 years of experience working in design, video editing, mixing, motion graphics and 3D animation. I’ve worked for a variety of companies from small advertising firms, investment banks, and luxury jewelry companies.


In part of my quest to learn more about generative AI I decided to go “all-in” and work for a company that primarily uses AI to make training videos for investment banks and hedge funds. 


With all that being said, I think it is important to mention that any company using generative AI to make content is not morally wrong, nor are they using inferior workflows or taking advantage of people. AI is really a tool like anything else and as long as you are honest about its limitations and trade-offs; it's not bad to sell those kinds of services.


My main gripe is with the level of dishonesty and misinformation about AI and how that misinformation is hurting workers in my industry and the companies they work for; AI is not a perfect tool that can do everything, it is an occasionally useful tool in the creator’s ever-expanding toolbox. 


The technology I am going to be mostly talking about today is Veo: Google’s Video Generator. I am going to go over other tools as well but these concepts and ideas can be applied to really any AI service.


So, if you are actually trying to make any reasonably good AI content this will be useful for you but just know I am going to come off kind of harsh on the final product and workflow.


I also opted to not use AI to help me write this article so however disjointed and imperfect this is, just know that this entirely came from my experience and research on these tools to make a specific deliverable. 


So however meandering and unfocused this is going to be, just know that I am trying my best to articulate my position on this technology without this technology’s hand in my writing.


This is going to be part tutorial, part critique and part prediction because the best way I can figure something out, is to just do it.


Part III: How to produce a training video with AI


The best place to start when making any video; AI or otherwise is really with the script and then storyboard. The project given to me was for a Cybersecurity training video. The company I work for likes to make these into narratives so it isn’t a dry reading of facts. 

The story follows a character named Priya who is investigating a malware attack on her company. Honestly, this is not a bad use case for AI. Work like this never had the budget for locations and actors.


So in many ways using AI is better than a more traditional approach.

The script was provided by the client along with written character descriptions. Since we already have a script, and were told basically how the characters should look we start making character sheets. We need to know what the characters look like and start generating scenes populated by those characters so we can make a thumbnail storyboard.

We want to make a solid base character, with very little detail. So we make sure characters don’t wear jewelry or have patterns on their clothes; all things that AI tends to take liberties with.


Since there are a lot of variables introduced every time you use AI to generate something, you want to clamp down on as many variables as possible so that unexpected things don’t crop up in the process.

An example of this would be the Priya character sheet prompt I fed into MidJourney:


Priya Prompt:

A full shot of a 30-year-old Indian woman standing straight, facing the camera on a pure white background. She has brown eyes, medium-brown skin, and long, straight black hair styled neatly. She wears thin-rimmed rectangular glasses and has an oval face. She is dressed in a navy blue blazer over a white blouse and matching navy dress pants, paired with a black leather belt and black closed-toe heels.


Midjourney, Priya Character Sheet


Thumbnail Storyboard


Right out of the gate we are going to run into some problems. Since AI kind of works like a 21st century Xerox machine. Any AI output that springs from another has a slight degradation in quality, accuracy, color etc…



Priya Node Map, Flora AI



So my recommendation is to use stock images, stock graphics or even photos and visuals you made yourself to feed Midjourney, Nano Banana and Veo. This reduces variables and allows you to exercise some control where you can.


When we were first working on this video I found a set of characters on Envato Elements in an office and a hacker scene.


Envato Elements


Envato Elements


Both these image sets had a few close ups, medium shots and 180º shots; enough to tell a typical story. Since they were real photographs and shot with real actors, things like: skin detail, color, lighting and character consistency weren’t really a concern. All I needed were just a few images, in a few angles that could tell a story. 


Since Veo was working off of real images; the quality of output was a lot better. The scenes didn’t “morph” as much nor did people’s body’s become disproportionate.


I mean… ultimately they still did with typing hands turning into deformed hook like blurs but the real problem with that workflow was we needed specific characters in a variety of scenes. 


Real Photo Animation



Priya (the main character of this video) is a good example of this. She needed to be in her apartment, her office, a control center, on a zoom call, etc… We were just not going to find images online of that kind of shoot and we didn’t want to go cast a woman and take photos of her because at that point you might as well shoot a video. 



So building out a character sheet and using things like Nano Banana to place that character in a variety of scenes was really our only option. This is especially true if we were to maintain a mostly AI pipeline. 


With the character sheet done, I took Priya into Flora Ai. This is a node based editor that I still haven’t really decided if I like or not. 


It definitely gives you the illusion of control, and by virtue of being a node editor it does allow you to have an eagle eye view of all your scenes. So at the very least it’s easier to compare all your scenes and check for consistency.


Again consistency, consistency, consistency most of my generations are not to improve really anything aesthetic (I have made my peace with that) but I can’t compromise on consistency too much, especially if you are making a 30 minute training video. 


Flora AI isn’t an AI in its own right, it is a wrapper for a variety of AI services. Each node can be designated as a text, image or a video node. Further, each node can be assigned their own AI service. You can use ChatGPT to generate prompts, pump those prompts into an image node that uses Flux and then the Flux image node into a video node that uses Veo.



There are lots of AI services to choose from with each node; but they all come with different amounts of credit fees, some less, some more. It’s a cool tool I would research it; if not purely out of curiosity. Flora AI is also a really useful storyboarding tool and I think I will use this feature in the near future. 


Though to reiterate, AI tends to drift so giving them something real to work off of keeps their feet on the ground but when you can’t use real images for whatever reason, using things like Flora AI or Nano Banana can get you a basically usable image with some level of character consistency.


Now you might say, “how can inconsistency still be a problem with apps like Nano Banana,” and I will have to admit it gets the best level of consistency out of any AI image generator I’ve used, but there are limits to the conceivable consistency these AIs can achieve.


If you have a character reference facing forward, it is going to basically nail it, but once we start generating angles, that’s when things fall apart. A side angle view of a character is going to inherently be interpretive; Nano Banana is going to have to guess what this person looks like from the side, and that guess is going to be different generation to generation. You can use a side angle you do like and generate more scenes off of that image, but the previous scene that character was in will influence the next scene’s aesthetic (an instance where too much consistency is a problem). 


You can extract the side angle or generate side angles to begin with, but that influences the layout and composition of the scene and often results in a bad composition. You can generate a lot of different angles of your character to counter this but it is just impossible to generate an angle for every edge case. It seems that the persistent trade off with AI is the more control you have, the less visually interesting the image, the less control you have the more visually interesting but more inconsistent from shot to shot and a lower fidelity image. 



If you generated many character angles, you can try to manage this, but adding more images doesn’t always help AI better understand what the person looks like from other perspectives. Though, even if the character does end up on model, the room they’re in tends to drift and the logic of the scene starts to unravel. New models do a lot better at trying to keep the consistency, but after a few generations they lose the plot, and I really can’t conceive of a remedy to this problem.


Nano Banana Generation Chat Window


Original Image


Nano Banana Generation


This is a good example of the level of consistency you can get with the best AI models. Not to pixel peep, but all these faces are slightly different. I decided to run an image of myself through Nano Banana and, though some images are accurate to what I look like, some are very different. The fact still remains: you have to mother this process a bit to maintain consistency.


Further, while I look fairly consistent from shot to shot, the rooms I am in break continuity. There are workflows to get around these sorts of problems, but they are the same workflows people have been using since 2022. Even if AI was able to achieve perfect consistency regardless of how many times you prompted off a single image, it really doesn’t change the “overall meta” of AI workflows. That being: more generations mean a better product. Right now, users are prompting for both aesthetics and consistency. With consistency out of the equation, there will be an increased incentive to prompt for aesthetics. Two hundred prompts for a 30-second spot will always look better than one hundred prompts for the same length commercial.


Also, consistency isn’t the only thing images need. The other problem here is that you may want to generate something evocative of your reference material, and strict consistency could be a problem. Nano Banana really forces consistency with the images you feed it, despite what you prompt. In general, these AI models really can’t account for every possible edge case.


Chedder Media Ideation Attempt, Nano Banana


That tangent aside, now that we have all the images we need, it’s time to animate them in Veo. At the time of this writing the main competitors in the AI video space are Kling, Sora 2 and Veo.


Veo by leaps and bounds is the best video generator available now. People look and sound realistic and can act in ways that show obvious emotion.


Occasionally you’ll have AI-jank like people clipping through furniture or Veo just not understanding your prompts and hallucinating; that and you are limited to 8 second clips.


Hallucination, Veo


That said there is a way around the 8 second clip issue by extending your clips in Google Flow (where my Veo credits live) but it is slow, unreliable and does a bad job at transitioning between lines of dialogue. It really doesn’t offer that much of a difference than just generating the clips separately. Because when I do extend clips I end up having to cover the edit anyway.


Bad Edit, Veo


So I kept my dialogue clips down to 8 seconds to improve speed, quality and predictability. It was just easier for Veo breaking the character scenes down to 8 second line reads.

Taking all that into consideration, Veo is really the only way to make this type of video in AI right now (at least with any reasonable level of quality.)


All other AI video generators have so many problems in the ways of artifacts, quality, resolution, length and general limitations on what they are able to create from a prompt. Veo really stands alone in this space.


Again, I don’t really want to harp on things like the 8 second time limit, or resolution because those are very likely to improve. If Will Smith eating spaghetti is any indication of the trend of AI advancement. AI tends to excel at improved coherence and resolution and is a laggard when it comes to removing the human operator, aesthetics, control, continuity and consistency (the most persistent problems with AI)


It is in my opinion that all other existing video models are so low-quality that Veo is really the only AI video app you should be using. Veo provides a bar of what I would call competent video and can compete on the same level of actually shot, or animated content. It is by no means perfect but for the time being it is certainly the only place you should be going if you need to generate any kind of video with AI.


Now to Veo prompting, don’t listen to people selling a prompt engineering course or telling you that you need to structure your prompt in JSON (that they are more than likely using ChatGPT to write). 


Author’s Note: I think prompt engineering is probably the most fascinating of the AI coping mechanisms, I touch on some of the best practices in this article…


To be frank the best way to write a Veo prompt is to give it an image (again human generated preferred) and write what you want to happen in chronological order… that’s it!

I also tested not using chronological order and JSON prompting and found it doesn’t really matter but at least for your sake; organizing your thoughts and having a basic idea of what you want to happen in the video isn’t a bad idea. 


Veo Prompt:

“Camera": [Static on tripod]

“Motion": [Woman walks down hallway]

“Woman Says in North American Accent”:  [“Hello, I’m Priya, Cybersecurity is one of the most critical challenges organizations face today.”]“End”: [“Stops talking and stops walking”]


You’ll noticed in the prompt I used “North American Accent” that sometimes works but you really can’t count on it. It will give you a lip-synced voice maybe 80% of the time and if you want your voice to be consistent you can use the ElevenLabs’ voice changer that usually works 50% of the time depending on the voice.


Lip-sync issue, Veo


I ran into a lot of difficulty trying to give Priya a North American accent but Veo is a racist and kept giving her an Indian accent. After dozens of generations trying to fix it, I decided to accept the Indian accent. But, when it came time to generate the close ups she had a North American accent. So I was like…“whatever that’s what I wanted anyway”


North American Accent, Veo


Indian Accent, Veo


I then went back to the wide-shots to re-generate the videos that had an Indian accent to be a North American accent but no matter what I prompted it kept giving her an Indian accent. So I went to change her voice in Elevenlabs and the voice changer still kept giving her an Indian accent!


So what I did was I changed the North America accent on Elevenlabs from Alisha to Emma and some how that worked, but even that victory was short lived because when I went back to change the voice for Scene 3 and looked for the AI Emma voice, it was gone…it was just deleted!


Ultimately we had to find a new voice and then use the Voice Changer on Elevenlabs to change the voice for everything I already laid down on the edit. 


Just so you know, this is a very bad production pipeline if you can’t even trust what your characters are going to sound like. 


Back to Veo, we blew through 500 USD just trying to get people in an office to talk on camera so right out of the gate we are kind of burning money and that was only the A-Roll, we need B-roll for an edit like this. 


I am basically doing this as a traditional “video cut” we record characters saying their lines in three basic shots, we do close ups and cut-aways to disguise edits or to reference what they are talking about on screen. All pretty standard stuff, an example of this would be this scene.



Things like news articles and text callouts to explain what is happening helps reduce how much we need to rely on Veo


Online, there tends to be a variety of stock graphics in similar styles especially for computer monitor UI graphics. A cybersecurity video was kind of the perfect use case for this particular approach.


If an edge case did crop up and there wasn’t specific enough stock footage; I could take a still from one of those stock graphics and pipe that into Veo.


Veo could generate a graphic in a similar style; where I could spend just a few minutes cleaning it up in After Effects. Helping to keep some kind of the visual language consistency without too much of a quality drop off. 


Original Stock Graphic, Envato Elements


Video Generation Based on Still, Veo


Overall to bring down costs we relied heavily on human stock video and stock graphics but if typing and getting an asset for your edit is the main advantage of AI, then in certain scenarios AI doesn’t really come out on top as a better workflow.


Things like text animations have been optimized to hell, so there really isn’t much of a benefit to using AI. Most text animations you can type what you want and use a preset to animate. Also, you can change the font, weight and kerning on the fly without impacting the animation. 


The way I look at it, the core advantage of generative AI is typing something in common language and getting a result. Whether I am playing the slots on Midjourney, Veo or traditional asset sites like Envato Elements or Shutterstock I feel like I am not breaking from the client’s preferred workflow for this project either way.


That sounds all well and good though right? A production pipeline with the occasional AI-jank. There are some benefits and some trade offs to using this service but all and all it is good enough and sometimes better than what a human can produce.


Further, I would at least sit through this for a training, it is coherent, not that distorted and some shots are honestly impressive….but that’s when AI is cooperating.


3 days into the project and Veo had a lobotomy and previously easy to produce scenes using the exact same image prompts now Veo struggled with.


Failed Veo Generations, Leonardo AI


Depending on the day AI can be a reliable way to generate quick assets and impressive visuals for a fraction of the time, skill and cost; or a cantankerous mess wasting your money to generate body horror images, and filling your browser with stomach turning, offensive and more importantly unusable video and images.


Head Spin, Veo


These systems are plagued with outages, buggy code and training data issues that on one day can produce content with fairly decent results to the next generating content that look like they were made with an experimental AI from 2023 or just not generating anything at all.


The lost days of work are staggering and imagine a manager who decided to replace all skilled labor with AI having their new skeleton crew of prompt engineers go up to them and say: “Sorry Midjourney isn’t really working today” or Veo keeps crashing.”


Warning Message, Veo


Author’s Note: Building a pipeline around generative AI sometimes feels like fool’s errand because you are functionally paying to be a beta-tester for a system they are hoping will eventually run itself - but ultimately can’t.


When Veo was cooperating, production went smoothly, but when it wasn’t production just halted. Nothing could get laid on the timeline. This forced me to find another way to access Veo


When I first started this project I signed up for Leonardo.ai because I thought it was the cheapest way to gain access to Veo, but I was ultimately wrong. Despite the higher price tag gaining access to Veo directly through Google was better value for money.


But still there was a credit budget that effectively put our generation limit at 250 videos a month. (It’s more complicated than that but for our purposes lets say 250)


That said, these kinds of credit constraints results in me “pulling my punches” so to speak. I spend less time experimenting and usually only prompt what I suspect will be correct on the first shot. If I were working in Cinema4D I could iterate without fear of stopping mid process and having to wait until next month to start work again.


This stop and go process isn’t even just limited to credits, the entire process is like “dead-reckoning” submitting a prompt you think will work and waiting 3 minutes to get a result. 

That said you can generate 2-4 clips at the same time using the same prompt, but that doesn’t really do much to change anything. It still works out to roughly the same amount of time if you queue’d these videos separately with the added risk of spending additional credits on clips you may not need. Going one by one keeps your credit usage down and really doesn’t affect video generation time that much. 


This lack of feedback stands in stark contrast to existing workflows. With a powerful enough computer you can get instant feedback in literally any modern creative app, from Maxon’s highly optimized Redshift or Blenders viewport based Eeevee “what you see is what you get” interfaces are here and well established. 


The steadier pace of traditional workflows though initially slower and sometimes less impressive is more reliable with the added benefit of not torpedoing your progress on any given day. 


Part IV: “Traditional” Workflow


So what would be the alternative to creating scenes in Midjourney, using Flora AI to generate different shots and using Veo to animate them?


I keep using the phrase “traditional approach” but everything I will be using for this alternate approach will also be fairly high-tech. Whenever I refer to something as “traditional” just know that I basically mean I am not going to use generative AI specifically.


If some AI crops up it will be the 20-60 year old AI tools we never really called AI. 

That being said, there are so many non-generative AI avenues we can spend the entire day listing them, but if the main goal is to have consistent characters on screen reading lines and doing simple motions, this project can be as simple as using your iPhone to shoot people around your office.


If people don’t have the time or skill, a studio could coordinate a shoot, but if you really wanted an entirely computer-based pipeline you can do worse than Reallusion’s suite of tools.


Reallusion allows this process to be as complex or as simple as you want it to be. Their Character Creator app uses sliders and presets to build out custom characters and iClone makes it easy to animate them.


Character Creator allows you to fine tune your character by choosing body type, eye shape, height, clothing and hair-style. You can send that character to iClone which allows you to use motion capture from sites like Mixamo; where you can drag and drop FBX files onto the character. The Acculips scripts allows you to get a mouth and face animation by just uploading an audio file. You can render out your entire scenes right from Reallusion’s suite of tools or bring these characters into your preferred 3D program to build out scenes that way. 



From that workflow, below is a clip compared to the AI pipeline.


CG Video


AI Video


First impressions, I would honestly say I don’t really like either of these videos. I think they are both bad in their own way, it’s just one offers a bit more control and consistency and the other doesn’t. Ultimately for both if you spend enough time and money you can get at least a reasonable deliverable. 


To reiterate, a critique of this particular workflow would be someone saying that “this is also an instance where technology makes things easier and aren’t all these workflows a kind of AI?” and I would have to agree with you.


In fact to make the characters look a bit more realistic I used HitPaw’s video upscaler to add a bit more detail to their skin.


But again my argument isn’t that using technology or even AI is bad, it’s just that to me this doesn’t seem as amazing of a tool that a lot of people think it is. Again, I am contrasting this workflow as a tool against other tools and judging on the basis of that criteria. 


Do these tools use AI though? Well yes and no, AI has existed in one form or another alongside computer graphics tools since the 80s. This particular technology using diffusion techniques to parse images, videos, and text or what we now call generative AI is a workflow in and of itself. Generative AI is separate from all the existing computer graphics tools we have today. 3D renderers, physics simulations, upscaling and motion capture are kinds of AI and they too have similar problems as generative AI.


Motion capture isn’t always 100% accurate, physics sims are tedious and computationally intensive and 3D camera solving algorithms don’t always work and unfortunately have a lot of guess work behind them. No matter how well you light or cover your scene with tracking markers; you never truly know if it is going to pick up the track data until you put it in the software and click solve. 


At the end of the day any comparison between an existing computer software and current generative AI is going to produce some parallels that make it hard to argue that one system is truly better than another. 


That’s why we can only look at how this AI is packaged and what tools it gives designers for control and aesthetics. Since current AI products lack so much in way of control and fine tuning the true advantage that a traditional CG pipeline has is scalability and reliability.  


I do not have to worry about running out credits or service outages because I am using software locally on my computer. I am also not limited to 8 second clips and each of the clips I render are entirely useable. With the main trade-off being render time and realism. 

Both generative AI and traditional CG pipelines do have the risk of their apps being sunsetted but the latter’s instances are exceedingly rare. Even if it does happen the redundancies built into a traditional CG pipeline are just more robust (OBJs are still knocking around and Wavefront has been out of business for decades.)


Frankly, I will always have the characters as 3D assets and they can always be opened in whatever future CG app exists especially with the advent of the .USD file format.


If we insist on calling the old-guard of CG tools a form of AI, then it's a form of AI that is infinitely more controllable than current generative AI services. I think it’s clear now my position isn’t that AI is bad, it’s really just the UI, marketing hype and goals of these generative AI companies that are the problem.


Reallusion uses AI but as a side car to human labor, making it easier to fine tune and even override when the result is incorrect. The generative AI services now have very little in way of control and fine tuning that these traditional applications have.


Part V: AI is cheap but not free


A lot of arguments I hear about the benefits of AI is the cost. Often times you hear something along the lines of “however flawed AI is, it's still cheaper than a shoot?” and that’s true, very few shooting scenarios are going to be cheaper than using AI.


Unless you have a highly optimized and efficient pipeline already established and the actors needed aren’t super specific; you are not really going to experience much cost savings with the live-action video route. Live-action video's cost savings only exist in long-form or serial content (more on that later)


That being said a lot of the claims that a shoot can cost tens of thousands or even millions of dollars is a bit ludicrous. Comparing AI to “above board” Hollywood union-shoots is a bit of a straw-man argument. I asked my former employer at AVW (a company that specializes in affordable shoots for TV commercials) for a quote on a 30 minute training video with 4 actors and deadline of 1 month.


This is the breakdown he provided:

Actors:

Pre-Pro and Post Production:

Total

3-5K

 5-8K

 9-13K

These are big numbers but lets compare that to the company I am freelancing for now. 

The same video took 2 months to complete which they paid me 4500 USD to edit, as well as generate all the AI footage. All our AI tools for those two months plus the Adobe Suite was roughly 3000 USD in subscriptions and credits (we had to top up credits a few times). 

At the beginning of the project we wanted to only use Veo, we put the cost projection at around 7500-10,000 USD to finish. This put us over budget, so we had to pivot and use cheaper and inferior AI methods (specifically Hedra) where we turned entire meeting scenes into talking head zoom calls. A scenario that would also bring down the cost of a traditional shoot. 


Talking Head Zoom Calls, Hedra


Ultimately, these are “Back-of-the-envelope calculations,” so to do an honest comparison I decided to break this project down to this scenario


Scenario: You are a solo, at-home freelancer tasked with creating a one-minute dramatization using either CG, live action or AI. The scene features two characters, a male office worker and a female hacker, who must appear on screen for a combined total of 30 seconds. The remaining 30 seconds can be filled with stock footage, voice-over, and motion graphics. With only a high-end modern computer and internet access, you are not allowed to personally use any physical camera equipment, call in favors, or make purchases beyond affordable online services and software. Your objective is to produce this dramatization as efficiently and cost-effectively as possible, as well as maintain professional quality within these constraints.


CG Video


AI Video


Live-action Video


Ultimately, I prefer the output of the AI over the CG, but the live-action video shoot definitely looks the best. I think they all suffer for different reasons but let's talk about this in a more analytical approach.


Since we’ve agreed that “good enough is good enough,” I’ll rate each method across seven parameters I think are fair for judging a project: fidelity, consistency, editor-hours, video generation time, turnaround, ease of production, cost, and reliability. Each will be scored from 1 to 5, graded on a curve, best out of the three will get the highest score and the rest will be scored on their proximity to the highest scoring. Closest to 35 (highest score) wins. 


I’m treating the workdays for this as typical 8 hour shifts (16 hours as a 2-day turnaround.) Software and AI costs will be based on monthly subscriptions where possible. The scenario is to complete one scene and assess how it scales, projecting costs for 12 additional videos (to match Veo’s monthly credit top-up for a more real-world comparison). All videos will be part of the same series, using the same characters and sets, with the only conceivable variations being camera angles or costumes changing.


Note: Except for Elevenlabs, all the required software and AI services offer significant discounts with annual subscriptions. For manpower, I’m setting the editor’s rate at $20/hr.


For the live-action video, I hired a production company I found on Fiverr (ChrisW) to handle all the character shoots.


All Credits, Software, Subscriptions (1 month)

Software/AI Service

Price (USD)

$82.49

Elevenlabs (1 Month)

$11

$1850

$16.50

$419.30

$141.40

Gemini (Veo)

$220

$10

$20

$43.19

Editor Hours

 $20/hr

Score Card


Live-Action Video

AI Generation

CG Animation


Metrics




Fidelity

5

4

2

Consistency 

5

3

5

Editor Hours

8 (Score: 5)

16 (Score: 2.5)

12 (Score: 3)

Video Generation Time

3 Weeks (Score: 1)

1 Hour (Score: 5)

40 Hours (Score: 2)

Turnaround (Work Days)

15 Work Days (Score: 1)

2 Work Days (Score: 5)

3 Work Days (Score: 2) 

Ease

5

3

2

Cost

1

5

3

Reliability 

3

3

5

Score

26

30.5

24


These are rough figures and some of this is admittedly hard to quantify due to the inherently subjective nature of some of the parameters. So take these numbers with a grain-of-salt

That said, AI clearly comes out on top in terms of cost and turn-around time but with live-action video as a close second due to its ease, high level of consistency and fidelity.


Overall Cost and Turnaround (1 video)

Pipeline

Overall Cost

Turnaround (Work Days)

Score

AI

$584.99

2 Work Days 

30.5

CG

$953.88

4 Work Days 

24

Live-action

$2,108.99

15 Work Days

26

My scoring comes down to a few key moments when producing these videos: As mentioned before, the AI workflow came with some additional trade-offs that leveled out the playing field.


I spent about 2-3 hours between Midjourney and Flora AI trying to make consistent characters, scenes and shots. It took 1 hour for Veo to generate the actual video and an additional hour making their voices consistent using Elevenlabs (this includes syncing the new audio to the original clips). It takes roughly 6-8 hours to make the content usable before editing can occur.


When making more images for videos I suspect it will take less time given that I already have the character sheets. However, when making future videos based on these characters we need to have access to the latest AI image models to generate different shots, props and costume changes. 


Broadly, the burden on the editor is greater in both the AI and CG versions. In both these videos the editor is more like an “end-to-end producer.” As mentioned before, when working with Veo specifically you tend to also experience service outages. This really hurt the reliability score of the AI version.

If you are willing to…


…use increasingly esoteric pipelines like Flora AI node maps, piping those into Image ChatGPT to make cohesive scenes (which tend to result in shrinking hunchbacks and increasingly sallow skin with each generation) 


or alternatively Nano Banana that sometimes just plops your character on top of your scene in a very Amelia Bedelia-esque way, 


all to then animate those images into video outputs with Veo,

which you will inevitably need to change the voices of said video outputs with Elevenlabs all in an attempt to keep some level of consistency…


…what you get is significant cost savings and honestly some flexibility with the biggest advantage being if anything needs to be changed you don’t need to reshoot or wait 8 hours to re-render a scene.


The elephant in the room is the live-action video pipeline, it is clearly the most expensive of the three but not by much compared to AI at a difference of roughly $1500 USD. 


So what do you get for that extra $1500 USD?


An incredibly simplified workflow, where I just had to send over a script and order the clips. In the live-action video pipeline the editor is purely editing, rather than fussing with Flora AI for consistency or Reallusion’s uncanny CG humans.


Personally, from an editor’s stand point spending that extra $1500 and just forgetting about the character scenes is an amazing feeling.


Though it wasn’t without its hiccups, the reliability score was hurt when the timeline had to shift because one of the actresses fell through (something to consider when doing a live-action shoot). What was supposed to be a 2 week turn-around became a 3 week turn-around.


All this consider the team I used ChrisW was amazing, the level of care, professionalism and quality they put into this project was truly outstanding. I highly recommend them for your video needs.


When it comes to AI voice generation, I really don’t have many complaints for Elevenlabs; though most of the voices do sound canned and the voice changer is pretty “hit-or-miss.”

For instance, I used Elevenlabs  to generate the voices in the CG version as well, which as mentioned at the top of this article I treated as more of a “blended approach.” 

Given that primitive forms of AI were handling the lip-sync and digital voices have existed since the 60s I didn’t feel like I was violating the “spirit of the challenge.” That is why I also used Hitpaw an AI upscaler to increase the fidelity of the renders as Redshift already uses AI to reduce noise and increase fidelity of renders. 


Again the CG pipeline is the more blended approach but ultimately both the AI version and the CG version are not completely AI.


When working on the CG version, it took very little time to build out the scenes and get the lip-sync to match the voice over. I also used stock 3D office scenes that came included with Cinema4D’s asset browser to place the characters in, but the lion-share of time was spent waiting for the render that produced an overall uncanny final product.


Depending on your graphics card and knowledge of Reallusion you can get better more realistic renders in a shorter amount of time independent of Cinema4D. I am using a 5080 Nvidia Graphics card that rendered all the scenes needed at 1280x720 in about 40 hours through Redshift


40 hours is a pretty long time to render video but these renders tend to go overnight. So, it didn’t hurt the effective turn-around time that much. 


Example: Starting your day at 9am, the first 4 hours are spent animating the CG scenes, at 1pm Monday you start the render. Tuesday you do nothing and let the computer render. 10am Wednesday, all clips are rendered, you work until 5pm and finish the edit, giving this workflow effectively a 3 day turnaround.


Where the CG version truly shines is its scalability. Reallusion can only be purchased as a perpetual license. So you see enormous costs savings over the course of a year for the 12 videos scenario. 

Overall Costs: (12 Videos 1 Year Deadline)

AI 

CG 

Video

$8,202.72

$5556.68

$8,957.88

What is surprising to me the most is that the live-action video (which is still the most expensive in all scenarios) actually has a significant cost reduction when compared to the AI production schedule of 1-year. (The cost difference is almost negligible)


To get this number I asked the Fiver team how much it would cost to make the same scene 12 more times. Due to the increased volume they would be able to do 12 videos for the cost of $6000 USD on a 1 day shoot. 


All Credits, Software, Subscriptions (1 year)


Software/AI Service

Price (USD)

$839.88

$219.96

$6000

$198

$419.30

$999.54

Gemini (Veo)

$2624.88

$288

$192

$99.99

This is an important figure because at first blush it may seem that you could get away with generating all the video for every method in a few months to cutdown on monthly subscriptions…and you could, but it comes with its own trade-offs.


This is where the Gemini subscription comes in. The first 3 months cost $125 USD increasing to $250 USD a month every month after that. Making the effective monthly rate $220 USD. You have roughly 250 video generations a month with no rollover. Placing the cost per Veo generation at around ~90¢.


I generated 20 (8 second clips) to produce 30 seconds of useable character video (an error rate of ~50%) so at the bare minimum that means I would need ~240 clips (almost the entire monthly budget) to finish these 12 videos.


All clips, Veo


Further, I can’t just buy Veo videos at $1 a clip, I need to have a subscription. So I either sign up for 1 month that barely covers my needs or purchase a 2 month subscription that provides more clips than is necessary. 


So in theory I can use my Gemini subscription’s access to Veo to generate all the videos needed in (at best) 1 month and save money on my other monthly subscriptions


Authors Note: The other AI services also have weird credit limitations that may not cover my needs in 2 months but I don’t want to get into it…


Since, the AI, CG and live-action methods are all technically capable of delivering the clips needed in 1 month, lets see how prices change when we make these 12 videos based on the timelines it would take to produce the actual character clips.



Overall Costs: (12 videos 1 Month Deadline)

AI 

CG 

Video

$4,104.99

$3,637.07

$8,029.99

In this scenario AI is the second cheapest option. So why mention the yearly cost breakdown at all? Well I think it paints a better picture of the reality of what a freelancer would need to spend. When doing projects you don’t really buy and cancel subscriptions on the project to project basis. You usually sign up for the year to take advantage of the annual rate in anticipation of getting future work.


The one year cost break down is more indicative of the typical costs associated with being a freelancer. Which means if you routinely offer AI services you can easily see that it will cost you more money in the long run. 


Projects vary a lot and you may not need to make more character based scenes this year. In that case the freelancer that offered CG or live-action made more money and had to keep track of less subscriptions.


So, while I can technically generate all the video needed for these 12 videos through Veo in about a month, outages and the lack of roll-over credits mean it might actually take two months of subscriptions to deliver everything. One month of 250 video generations may not be enough. That said, the cost of the actual AI subscriptions (not factoring in editor hours) was relatively cheap at $216 USD.


Isolated Cost for Character Video Creation (6 Minutes)

AI

CG

Live-action

$216

$419.3

$6000

But for whatever cost savings is had at the point of purchase, it is certainly offset by the time I spent putting this together.

Editor Hours

AI

CG

Live-action

192

144

60

$3,840.00

$2,880.00

$1,920.00

Like I mentioned before there are so many setbacks in generating images and video with AI that simply don’t happen in the other pipelines. Most of the time I don’t even change my prompt and the video comes out completely different.


Nowhere in the prompt did I mention fire, Veo


Same prompt no fire, Veo


Further, I think most AI enthusiasts would say that the subscriptions I have are the bare minimum needed to produce AI video content. Apps like Pika, RunwayML and Kling are noticeably absent. This was all an effort to keep costs down.


So if your goal is to produce multiple videos, quickly at a low budget, ironically AI took the longest to edit, at the second most expensive rate. 


Author’s Note: Which isn’t that surprising given that I built this project around trying to give AI the best shot at winning, but it’s pretty telling that it barely squeaked by…


Speaking of cost reduction, I could have reduced the cost for the CG version by opting to use Blender instead of Cinema4D. Like I mentioned before Reallusion can be used end-to-end and that would reduce the cost even further.


Taking that into consideration you can see a significant cost savings on the CG version both on the monthly and yearly cost analysis.


AI 

Reallusion / Reallusion + Blender

Cinema4D + Reallusion

Month

$584.99

$812.48

$953.88

Year

$8490.72

$4557.14

$5556.68

These figures clearly illustrate a significant cost savings over time compared to AI. In general as you produce more videos costs trend down in both the CG and live-action version, where as with AI (however initially low) the costs start to trend up, and it’s not just because of the additional editor hours.


Since the CG video was less visually impressive, I wanted to compare the scalability of live-action video against AI, as their final products were more comparable in quality. I decided to calculate the cost of generating the final usable AI video versus the live-action video, that is, the cost per Veo generation compared to the budgets ChrisW’s team gave me.


Let’s start with the live-action video. I was only given the final shots, no extra takes or raw footage. Everything was delivered ready to use in the edit, so it’s easy to compare both, since I’m looking at what I spent to get usable video, not footage in general.

In a way, the failed Veo generations are the AI equivalent of bloopers, which I have to factor into the cost; just like the production company built that into their $1,850 USD and $6,000 USD quotes.


As I mentioned before, I am using the highest tier Gemini subscription (Google AI Ultra) to generate Veo videos. For $2,624.88 per year, we get 300,000 credits, about 90¢ per 8 second clip.


Author’s Note: It is technically possible to generate unlimited video generations but that is only reserved for their experimental or earlier models. Which are just not good enough to generate the video for this project. The fact that there is a credit system at all should tell you everything.


To create the AI example video, we needed to generate 20 (8 second clips) to produce 30 seconds of usable footage. We can calculate the cost per 30 seconds of usable video and then extrapolate the cost per usable minute and hour by increasing the video load by a factor of 12 and doubling it each time after to compare with the live-action quote.


Example:

1 video = 30 seconds of usable character footage (20 Veo generations)

12 videos = 6 minutes (360 seconds) of usable character footage (240 Veo generations)


AI Video

Length Useable Footage

Total Clips Generated

Total Cost

Credits Used

30 Seconds

20

$18

2000

6 Minutes

240

$216

24000

12 Minutes

480

$432

48000

24 Minutes

960

$864

96000

48 Minutes

1920

$1728

192000

1.6 Hours

3840

$3456

384000

3.2 Hours

7680

$6912

768000

6.4 Hours

15360

$13824

1536000

12.8 Hours

30720

$27648

3072000

25.6 Hours

61440

$55296

6144000

51.2 Hours

122880

$110592

12288000

102.4 Hours

245760

$221184

24576000

204.8 Hours

491520

$442368

49152000

409.6 Hours

983040

$884736

98304000

Now let’s compare that to the live-action video. The initial price drop for the 12 requested videos is dramatic, about 70%, which I consider an outlier. I don’t expect such a steep drop as production scales, so I set the price per 30 seconds of video to decrease by 30% each time the amount of usable footage doubles. I think this is a fairly conservative estimate for cost reduction with increased volume.


Live-Action Video



AI & Live-action Costs Side-by-side

Length Useable Footage

Ai Video

Live-Action

30 Seconds

$18

$1850

6 Minutes (12 videos quote)

$216

$6000

12 Minutes

$432

$8400

24 Minutes

$864

$11760

48 Minutes

$1728

$16464

1.6 Hours

$3456

$23049

3.2 Hours

$6912

$32269

6.4 Hours

$13824

$45177

12.8 Hours

$27648

$63248

25.6 Hours

$55296

$88547

51.2 Hours

$110592

$123966

102.4 Hours

$221184

$173552

204.8 Hours

$442368

$242973

409.6 Hours

$884736

$340163


Preceding Tables Visualized


Though the cost of AI video generation stays cheaper and flatter longer, live-action video costs less after the 100-hour mark.


100 Hour Mark Isolated


Admittedly, these are all estimates, and there is no way of really knowing if Google would even let you have access to that many credits. Additionally, we are calculating the cost of production based on this one video shoot with one production team. So again, take these numbers with a grain-of-salt.


So how is this possible given Veo’s persistently lower video generation costs? I think that’s the point…persistent. Due to its fixed pricing model, AI video is cheap in the short term but more expensive in the long run.


No matter how many videos I generate, the cost stays the same, nor does it get easier to produce. At the 100-hour mark, it costs 46¢ per second of usable live-action video.

For AI, the base cost is about 12¢ per second of video, but the number of generations needed for usable footage has increased. If I need to generate 160 seconds of Veo video to get 30 usable seconds, my effective cost rises to about $1.50 USD per usable second.


In traditional video production, once you have the sets and actors sourced, it becomes much cheaper to produce the video you need. The upfront cost is expensive, but after you work out the kinks, it becomes much cheaper to produce the final product. Sets are already built, camera operators know what shots work, and actors become extremely familiar with the material. A huge advantage is that you can shoot hours of footage non-stop as your actor runs their lines.


Back to CG’s scalability: I can always do something to reduce scene complexity and thus reduce production and render times. I can use simpler textures and lighting setups, reduce resolution, and reuse animations.


AI doesn’t have that luxury. Every time I ask it to generate a video, it’s like the first time. It doesn’t get easier to produce my requests, and even if it does, it still charges me full price and only gives me 8-second chunks, and those 8-second chunks are wrong about half the time.


Regardless of whether you ask Veo to generate a video of a blank wall or a bear skydiving into a can of ginger ale, the cost is the same, both in money and compute. In fact, I was so inspired by that preceding joke scenario that I generated both videos in Veo at the same time. Even though the “White Wall” video was queued first, the bear animation finished faster. There’s no rhyme or reason to what’s easy or hard for AI to do!


The Great Race, Veo


Simply put if AI is to become scalable, developers need to find away to make it less computationally intensive to generate similar scenes it generated before and charge you less to produce them. 


Now you might say this is an odd use case, obviously live-action would be better at producing something simple like this, besides who is going to watch 100 hours of people sitting at desks talking, to which I would respond…have you heard of the show The Office?


We also don’t need to really dig deep to start seeing productions that have a similar costs per 30 seconds of video at $10-50 USD. Micro-budget films like the movie Clerks  (1993) and the Blair Witch Project (1999) prove that you can make finished video for the silver screen for an extremely low-cost. 


Clerks (1993) is essentially people in one location talking. It had a budget of $30,000 USD adjusted for inflation thats $70,000 USD. The film has a runtime of 91 minutes. At that rate the cost is ~$23 USD per 30 seconds of video.


The Blair Witch Project (1999) had a budget of $60,000 USD, adjusted for inflation $116,000 USD. The film has a runtime of 78 minutes. At that rate the cost is ~$43 USD per 30 seconds of video.


And these micro-budget films are no longer the fringe cases they used to be 30 years ago. These small independent creators creating video that people want to watch are the rule now. Most content consumed today is created by independent creators that costs them virtually nothing to produce. 


Author’s note: I was thinking about comparing AI to producing my own social media content but the outcome would be so boring because ultimately I would be comparing a ~$200 USD subscription to talking to my phone’s camera. 


Ultimately, these are estimates and you can disagree how I calculated cost reduction on the live-action shoot (the cost per 30 seconds of useable video may plateau sooner than I think it does) but the fact still remains; it gets more expensive not cheaper as you produce AI video.


Also, I don’t think it would be that hard to get a production company to take my $175,000 USD to produce 100 hours of footage of two people sitting at desks, talking to themselves and typing on computers. 


I reached out to ChrisW and asked how many people worked on that shoot. He said 7 people worked on it (including the actors). 



Let's say they can shoot 100 hours of footage at a rate of 5 hours a calendar day; it would roughly take one month to produce that video. Split evenly among the team, that would be $25,000 USD per person for less than a month’s worth of work. That huge budget still beats AI on cost.


So whereas for a traditional production company, producing more video in a similar style results in a significant cost savings. When it comes to AI it doesn’t really matter if your content is in a series, set, or is similar to what it already generated for you, it costs the same either way. 


Even though AI on its own doesn’t really scale, there are some immediate cost savings when it comes to what you have to pay people to assemble these kinds of productions. AI, by virtue of its existence, has brought down labor costs across all industries.


At the very least, AI has certainly created a more competitive labor market and has forced workers to lower their rates, as similar deliverables can now be provided by comparatively less skilled workers.


AI’s cost savings exist in the abstract sense of what one worker can conceivably do with AI, with very little training or oversight. So, however much the hours may increase to assemble an AI output, that is offset by the lower rate these workers are working at.


That said you may notice a “flaw” in my logic and say for the live-action video getting the polished final takes is not indicative of a real-world workflow. I considered that and was reminded of my time at AVW.


We routinely produced 30 seconds spots with 1 day turnarounds. The reality would be importing about 30 minutes of raw footage. Scrubbing through that at double speed, and finding the final takes in about 15 minutes. From transfer, to takes, it took me about 30 minutes (at most) to get 30 seconds of footage on the timeline. 


Further, I set the editor hours at $20 USD per hour (which may seem odd), but I feel this kind of flat rate across pipelines is indicative of the present-day economic situation. Also, that is literally how much I have been charging recently for all content I produce, AI or otherwise.


Since we probably can’t drive down the price of AI generations any further (and it will conceivably only become more expensive in the coming future) the cost savings will need to come from underpaying the people actually in charge of generating this content.


For example, in a post-AI world, a Houdini VFX artist may find their work dwindling and be forced to take on lower wage positions that insist on an AI workflow. Since these AI workflows will be comparatively easier to learn than existing Houdini workflows, it is likely that this worker could be trained in this AI pipeline.


Since labor has a set minimum wage in most countries, the lowest an American Houdini artist could charge is $7.25 USD an hour. This means that the rate of Houdini-based deliverables has been greatly reduced. A person who needs VFX work can have these Houdini artists work however long, guiding prompts and occasionally doing bespoke work if the AI doesn’t deliver.


It is also conceivable that AI lowers the bar of quality so much that “good enough” for AI becomes “good enough for Houdini.” With an army of hungry Houdini artists at their disposal, studios, software companies, and artists who want to stay competitive will take advantage of this new labor market as well. It is entirely possible that AI’s competition brings down the cost of high-end, bespoke VFX, at least in the short term.


But even considering the labor savings, scaling is again where AI falters due to the randomness of each output, general unreliability, and increased editing hours coupled with expensive subscription and credit costs.


Though AI is superficially cheaper, you spend more time negotiating with the AI to get a desired output, thus bringing up the overall cost. This ultimately results in a compromise between what the editor wants and what the AI can reasonably create.


Bottom-line as I produce more AI videos, I need more generations and I need more credits.

Even though AI takes the crown on the singular video, it is trounced by CG and live-action when it comes time to scale the project.


I think it is fair to say that there are workflows and scenarios I’m missing. I think it is theoretically possible to accomplish this project where either the CG, live-action or AI comes out as the cheaper or faster option.


But that’s kind of the point right? AI is not the straightforward cheaper option and it is certainly not free. 


You can spend countless hours scouring the internet for the better deal on credits or strategically start and cancel different subscriptions. I certainly could have spent more time figuring out the exact recipe to stagger production to get the lowest cost out of AI,  but at the end of the day that takes effort and creative bookkeeping. 


In the AI workflow you find yourself spending less time working on the actual video and spending more time creatively hopping from app-to-app to finish the project without spending too much money.


Which isn’t necessarily bad, especially if you prefer what the AI output can give you but this is all dependent on the client’s ask, budget, timeline and standards.


If you take anything from this section, let it be this. In my perspective as an editor…nothing has really changed. The work of compiling, timing and editing is basically intact and if anything the new AI workflows being introduced ultimately feel more like hunting for specific assets and stock footage. 


Though that introduces more of a burden on the editor, it shows that at least for now the work of the editor is safe.


It’s probably not going to be anytime soon that editing is going to be totally automated or at the very least there will probably be a reluctance to hand that responsibility to an AI. That stage of fine tuning and curating is still as important now, as it was before generative AI.


Though considering all this, it’s hard not to predict a future where AI is able to generate an entire 30 minute TV show. AI could even generate all cuts and close ups needed seamlessly, with only the editor asking it to replace a few shots that don’t quite read.


However, I can’t see production companies ceding that much control over to these models. They still want to have some level of oversight…right?


Maybe I’m wrong and all content will be totally automated. It’s possible that companies don’t care what’s being produced as long as it’s tailored to their audiences, but if other Silicon Valley overhyped tech like self-driving cars are any indicator of the future, it may not be technologically possible for AI video to have that level of length and quality.


Companies have already invested billions of dollars on models and data centers that can barely keep it together for an 8 second clip and we might be already at the logical conclusion of this technology. It may not be possible to generate things any longer or coherent than what is happening now. 


In any case both assuming AI is going to be the master of any domain it pursues as well as assuming that your particular niche is safe from AI are both equally foolish.


Part VI : Let’s all go to the movies


We spent a lot of time on the short comings and costs associated with different video pipelines, chiefly those with AI but that’s not the only cost.


We take it for granted that over time new technologies become better and cheaper to produce; but that is really only a phenomenon we see with consumer electronics. 


Consumer electronics tend to be small and easy to ship so most of their manufacturing cost savings comes from taking advantage of cheaper labor markets. Something that products like automobiles can’t really take advantage of. 


After the initial cost savings introduced by assembly line production that Henry Ford developed for his Ford Model-T the cost of automobiles have stayed persistent. When adjusted for inflation a new base model car is equal or more expensive than the price of a Ford Model-T.


Automobiles are hard to ship and it is sometimes cheaper to just manufacture them in the country they are being sold. When it comes to core function, automobiles aren’t that different today as they were 100 years ago. The core function is human operated motorized personal transportation.


So let’s look at the relatively low introductory prices of these AI services, at the top of this article I mentioned that I didn’t want to touch on the more tired arguments against AI, and I ultimately won’t. That said, given the amount of investment and how Silicon Valley generally operates; are the prices going to stay that low forever?


Take the Amazon or Uber model for example. Silicon Valley tends to sell their services at a loss, forgoing profit to consolidate marketshare. Keeping themselves sustained on a steady drip of venture capitalist funds. Once all the mom & pop’s are closed and every taxi company is out of business they start to steadily increase the price of their services so that they can finally make a profit and payback their investors. 


I think the best place unpack this unprecedented volume of investment is with the over $200 Billion USD that has been invested into OpenAI with its principle investor Microsoft contributing $135 Billion USD in funding. The company has over $1 Trillion USD in long term financial obligations. The company says they plan on being profitable or at least have a positive cashflow by 2029.


Looking at this you might say that it is impossible for OpenAI to satisfy those debts and provide an ROI to investors, especially if the timeline is 3 years… and you would be correct.

They will not be profitable by 2029 and a lot of investors will lose their money but OpenAI won’t go out of business. If the Silicon Valley taught us anything, it’s how a company can lumber on, hemorrhaging money for years without going bankrupt.


Regardless if they actually need to be profitable or not, I think it's fairly reasonable to assume that the pressure to become profitable or just having enough money to operate will result in some kind of price bump passed on to consumers.


All this investment is based on the belief that AI is improving exponentially, that in 3 years the world will be totally transformed by AI, but why do we give special consideration to the advancement of AI and not anything else.


We can spend all day talking about the speculative future of AI but to make video production decisions in the present we need to focus on the here and now.


OpenAI has had billions of dollars poured into them and Google has spent so much internally on these technologies to functionally produce asset sites and code repositories…They can’t be another Envato Elements, ShutterStock or Github they need to be something completely unique and extraordinarily valuable. 


What people get wrong about exponential technological advancement is that though it is exponential, it's not infinite. Technology does eventually hit a brick wall, take jet engines for instance there hasn't been much improvement in speed since the 1960s.


And what would that exponential growth look like, 8 second clips in 2025, 16 in 2026, 32 in 2027, 1 minute in 2028 with 2 full minutes in 2029? At that rate it will take a decade (2035 to reach the length of a 2 hour movie) and we have no idea when we are going to hit the brick wall.


Further, if AI can infinitely improve, why not CG workflows or live-action video? That’s a silly question right? Obviously CG and live-action video have been optimized to hell so there really isn’t much room for improvement except in terms of style. 


Traditional live-action video hit its logical conclusion in 100 years, CG in 40 and it’s entirely reasonable to place AI’s at around 20, giving us another 10 years of tangible improvement.


Back to cost, plenty of new technologies were introduced on the premise of cost savings. So with that in mind, these AI services are pretty pricy and it’s pretty likely this is the cheapest rate they can sell these services. The price is only going to go up from here.


We just made a 1 minute video and only used AI for 30 seconds of it, if we decided to have AI make the stock footage, music and text animations we would could have easily spent $1000. 


If you calculate the AI services independently of man-hours and assets sites, that is $200 to produce 30 seconds of video. At that conservative rate it would cost $3 Million dollars to produce a 2 hour AI movie and that is the subsidized cost the consumer pays.

So if this is the subsidized cost, then what is the true cost? 


Well, if we are going by OpenAI’s own claims, more than likely $30 Million USD. OpenAI has announce that they will produce a feature length animated film named Critterz. This budget when compared to Hollywood blockbusters seems like a steal, but is it? 


OpenAI for however flush with cash they are is still functionally an independent film maker. OpenAI doesn’t have to deal with Disney or Pixar’s institutional inefficiencies, unions and studio overhead that tend to balloon budgets. 


That’s why I’d like to bring up the budget of Mary & Max a meticulously handcrafted and labor intensive stop motion film that (when converted to USD and adjusted for inflation) cost a meager $8 Million USD.


Another example would be Toy Story (the bellwether for animated CG children’s film’s) it cost only $30 Million USD to produce. Toy Story was released in 1995, so let’s compare that to another film released that year: Pocahontas. The budget of Pocahontas was $55 Million USD that’s a difference of $25 Million USD. 


Looking at those budgets it would be fair to predict that all future animated films would be CG because they offer such a considerable cost savings. With that in mind let's now jump to 2009, the last year Disney produced a 2D animated feature film.


Pixar released Up at a budget of $175 Million USD, compare that to The Princess and the Frog’s budget of $105 Million USD. 


I think it’s also important to mention Studio Ghibli’s film for that year Ponyo was $34 million USD. Studio Ghibli kind of stands apart from all animation studios with its extreme resistance to adopt computer workflows to expedite the production process. The result I think makes their films these precious things that tend to be cherished by audiences.

Another struggle that AI image, video and audio generation services face are that they appeal to niche industries. Creative workers are an extreme minority of roles and any labor savings would really go unnoticed as most projects are the result of a handful of wildly talented individuals. 


Further, the people who end up having to use these tools are reluctant to do so and still end up doing a lot of stuff by themselves. This is due to their own insecurities, or in most cases they can just do a better job. 


So given that new ways to create film & video tend to have cost savings in the very beginning only to creep up as time goes on, coupled with the presumably high salaries of the few people left in-charge of actually making these AI films.


It seems to me that the only way for AI companies to really make an ROI is to eliminate their competition. As mentioned before Amazon drove traditional retailers out of business by initially undercutting them to only raise prices themselves. While Uber drove every taxi company out of business to then steadily increase fares.


I think it is safe to assume that AI companies in pursuit of the same business model are encouraging people out of these industries and when they are the only way you can create video, audio, images or text they will increase the prices of these services (however mediocre) because now you have no choice. 


Part VII: Ai is amazing


I’ve spent a lot of time belaboring the point that AI is not always, cheaper, faster or better than humans, but you cannot deny the fact that their are people doing amazing things with AI right now and I think it’s worth investigating why their outputs tend to be so much better than most AI generated content.


AI like any medium has a certain “quality" to it. I suspect as with most things, the problems with AI at its onset will still be problems throughout its life as a method of creation. For example, there are problems with CG today that were problems since day one. Realistic skin and human anatomy is still extremely hard to achieve as we saw before. That is why we see certain aesthetics prevalent in CG. Cartoony humans and hyper-realism that is relegated to inanimate objects.


Luca, Pixar (2021)


Pixar started at the same spot as a lot of CG companies did, but where they stood alone was by acknowledging the limitations of the technology, not its theoretical potential. When you focus on potential, you go for very ambitious projects that fall short of what stalwart methods tend to excel at. Pixar just had a better understanding of the medium. They understood things like storytelling, character design, giving textures patina, edge wear, and scratches. They understood the limitations of CG, so they leaned into them, keeping things simple and focusing on stripped-down films so they could focus on a small part of an idea and make it as good as possible.



Luxo Jr. got a standing ovation at SIGGRAPH, and that makes sense because it's still gorgeous today. The background is a black void, and the ground is simple wood. The fact that the characters are desk lamps works on two levels: they are easy to animate since they have hard pivot points at their hinges, and the shining lights emitting from the moving bulbs add realistic lighting to the scene. Since light and metal tend to look fairly real in CG, why not make a character that is both?


Their first movies were all stories like this, stories about non-human things coming to life; stories about toys, cars, fish and monsters only because humans are too uncanny in CG. 

In Monsters, Inc. (2001) you may notice how little Boo is on screen. She’s relegated to only a few very short bursts at a time. She is also very small compared to Sully. So every composition with the two forces her to take up small portion of the frame. In fact for the entire 2nd and 3rd act of the film she is in a costume that occasionally covers her face. In the final scene you don’t see Boo at all. It’s just because the Boo model doesn’t really stand up to scrutiny. The artists knew this so they tried to limit her time on screen as much as possible which allowed for some very tender moments in the film. 


Monsters Inc., Pixar (2001)


Monsters Inc., Pixar (2001)


I have a sense that during this time of rapid improvement people felt that the technology was the only thing improving but that’s not the whole picture. The tech improved but the artists improved too, you can still make bad CG art even with state of the art applications; I think I’ve proved that myself with my examples. 


Sonic the Hedgehog, Paramount Pictures (2020)


AI does not have that same kind of adoption by artists as CG did, and the philosophy to lean into its limitations is fairly absent. 


Victor Navone, 1999



The limitations of AI are downplayed as stumbling blocks to an ultimately perfect future product, so when a person who generated an AI piece gets feedback, they are basically immune from criticism. Since they can’t control what happens, they just need to wait for AI to get better.


So with that in mind, I think acknowledging the prevalent limitations of AI tend to yield the best results. I’ve compiled a list of examples that I believe showcase this embracing of AI’s limitations and working with the tools to create a high-quality result. 


The main commonalities through-out all these examples are that they are short form content. Content where a high-level of detail, precision and continuity aren’t necessary or at the very least can be mitigated by generating a tremendous amount of content. Said content to be curated for a short-form deliverable. Or the AI is handling one particular portion of the content, not carrying the whole piece.


I wrote these questions with AI design, images and video in mind but I think these rules can apply to any project where you are trying to decide if you should use AI.


  1. Can this be imprecise? 

  2. Can this be relegated to a portion of the project?

  3. Would it be hard for me to do this myself?


Generally speaking, if you answer yes to all three, this is a good use case for AI. Anytime you can have AI focus on a specific part of a project that doesn’t need to be super accurate or detailed; as well as something that can be easily swapped with multiple AI generations or human-made content (that would presumably take longer or be very difficult to do yourself) you have a good use case for AI on your hands.


I think the best way to illustrate this is walking you through a few projects that I think showcase this the best. 


I would like to start with the single best AI video I have seen to date. This Yves Saint Laurent Perfume spec spot is simply gorgeous and is not the typical AI slop or even high-end CG content that you usually see.


Yves Spec, Vafinnn


This was created by the artist vafinnn, a talented photographer in his own right. He goes through his process on his YouTube channel and basically explains (in Russian) the iterative process and how many images and videos he had to generate to create the final piece. As a highly talented artist himself it’s pretty clear why this looks so nice; he has high standards and he knows what clients want.


At first glance, this looks like an extremely competent video, and frankly, it is. However, spots like this aren’t very complicated. The bottle and natural objects are highly detailed, and what seems like sim work with the liquid effects can basically be achieved with procedural noise. Ultimately, there are only four basic elements to this piece, and they are all pretty easy to source or create.


I think this video clearly competes in the CG space, basically producing a spot that would usually take a CG application like Cinema4D and an artist with a few years of experience.


He starts his process by using an image of the perfume bottle and I feel that tacitly implies that you would have to 3D model that yourself, but you don’t. Most CG artists are “kit-bashing” basically taking assets (mostly free or stolen) and combining them in their scenes. 


Since the animations looked simple and I worked out how I could do the “liquid” animations; I decided to attempt to recreate the video to truly see how difficult this would be versus the AI workflow. 


First would be sourcing the assets. There are really only 3 objects I need to find: A flower, wood and the perfume bottle. The perfume bottle was pretty easy to find on CGTrader for $6 USD , the flower I found a free 3D scan of a peony on SketchFab and finally for the wood, though I didn’t find an exact match, I liked these free logs I found on Turbosquid. 





The rest of the scene is just lights and planes made in Cinema4D. It took me about 4 hours to get all the scenes looking the way I wanted them to look (a comparable timeline versus AI) but objectively where the AI has me beat is render time.


Even though I was able to achieve a similar look and animation in about the same amount of time, all the scenes took longer to render. I had to render these 5 scenes overnight and into the next day, I imagine vafinnn’s turnaround time was an 8 hour day at most. My turn around is at best 2 days and it's entirely dependent on how good of a computer I have. 


Dare to compare


Though I prefer the AI version overall, there are some things I think CG just does better especially when it comes to the accuracy of the product and overall fidelity of the piece. 


The product consistency vafinnn achieved is definitely something that would be a bit easier to get with Nano Banana now but the product does appear narrow in some shots and squat in others.




Further, the band around the top is sometimes lost. In the CG version consistency is maintained as well as the detailed top band. While this may seem like a nitpick, for a high-end product, the legibility of such subtle touches could be important to the client.


Offical Spot, Yves Saint Laurent


Another instance of detail is the flower scene; I think the flower scene in the AI version has a more realistic flower but I think the same scene in the CG version has more drama and a better look overall.



Ultimately, I think these two spots are essentially the same; especially when it comes to an ad you’ll scroll past on Instagram. I think most people aren’t going to appreciate the nuances of the AI or CG version. Though, I do believe that the AI version does look better overall than the CG version and it's probably for a few reasons.


First this is a project that inherently leans into AI’s aesthetic strengths. When I showed the CG video to people without context they said they thought it was odd and looked almost like AI. There seems to be an unintentional vernacular to AI videos that translates when you attempt to copy one into another medium. 


I think it is fair to say if I was given a similar brief and had not seen the AI video; this would look a lot different. It would lean into CG’s advantages like quick and easily fine-tuned mograph effects. 




Or like these more traditional CG projects, one a simple CG photoshoot by Rhee Design Agency that I think looks better than both of our spots. Or this showcase that shows traditional CG renders being creatively upscaled through Comfy UI.


The later shows my ultimate point though, that I can’t help but think (at least for my own sake) that a superior workflow exists between these two. An example of this blended approach would be ruslan_tikhomirow, who builds gorgeous visuals in Cinema4D and renders them with Redshift. His work is truly outstanding and highly detailed and he has found a pretty compelling AI use case for people who work in traditional CG pipelines.


In the previous example, I think my video kind of fell apart when it came to the liquid animations. I didn’t want to spend too much time doing a true simulation and decided to quickly create some interesting looks with procedural noise and keyframed animation in the Volume Mesher. I do think they still look interesting, but I personally prefer what the AI was able to produce.


Regardless of which you prefer, I think it’s obvious that the AI video did better at quickly making a good-looking video, while the CG was better at keeping consistency and legibility of the product and objects in the scene. Liquids and organic elements (like the flower) are a perennial weakness in CG, and AI just takes less time to get a good result.


Also, the workflow isn’t really that much different from what we do in CG to get a simulation. A lot of simulation work is setting up parameters, running the sim, and picking which sim looks best out of a few tries: slightly tweaking settings from generation to generation. It’s the same thing with AI, but instead of seed values or friction adjustments, you’re changing your prompts. In CG some sims can take as long as 30 minutes but with AI just 3 minutes for 8 seconds of sim is not too bad and it doesn’t tie up your computer. 


With that kind of workflow present in ruslan_tikhomirow's mind, I think his videos make a lot of sense. He renders gorgeous CG scenes in Cinema4D and then runs that image through an AI video generator to get realistic liquid sims, sometimes compositing both results together.



The simulation potential doesn’t stop there, if you don’t know much about VFX workflows or software it's not a bad avenue to go down. This Hailuo AI ad shows a good example of this, though I think suggesting that the only way this explosion could be done is with a Houdini VFX artist is a bit misleading. Most scrappy independent film-makers have been getting by with decade old Video Copilot tutorials, stock sites or back-yard practical effects.


Also, this is still using AI in isolated instances, where you need an explosion not the explosion. Explosions tend to have broad tolerances and you can get away with them varying shot to shot in your film, especially if they are the only part of the piece where you used AI. Though, if you need something specific I wouldn’t go down this path.


An example of a very specific explosion sequence would probably be in Kingsman: The Secret Service (2015). The level of control both in color, timing and shape of the explosions coupled with the complex tracking shots, it really seems unlikely that AI could achieve something this complicated through text and image prompting alone. 


The recommendation though is still to avoid using AI end-to-end because in my experience more things can go wrong than expected and there are often no real solutions to fix those issues other than hoping for a newer model or persistent generation. 


Though end-to-end AI pieces that are high quality are a rarity they do crop up from time to time especially by the creator simonmeyer_director. He is the exception to the rules I laid out.


Though it is easy to see that he shapes his projects around the limitations of AI. Take this amazing Heineken Ad , it’s truly astounding and I can’t come up with any good reasons to do this with a real shoot and CG other than control and fine-tuning. Still this leans into the persistent limitations of the medium, it is a commercial (inherently short form content.) The characters are silent (though current AI models can more than handle voice in video) it shows he is trying to clamp down on as many variables as possible.


The commercial focuses on close ups and more than likely uses the same image of the two men to generate multiple clips. Also, I wouldn’t be surprised if he went through many generations before he landed on this piece. A lot of images and videos were probably left on the cutting room floor. 


We don’t really know his process but given what I outlined previously coupled with my mediocre results he is obviously bringing something to the process. If only by virtue that most AI content looks terrible and his stands alone in a sea of mediocrity. 


Another example of this philosophy is this Frankenstein spot. Consistency is easy because Frankenstein is a character with broad tolerances. He really just needs to have green skin and flat head to look consistent. The video is also based around an on camera interview reminiscent of the style of video I made before. Using silent b-roll over his testimonials all of which are conveniently in 8 second chunks. 


Where both these videos fail is using AI end-to-end, there are a few shots where you can just tell it's AI and ultimately had to make the cut because it helped tell the story. There is always going to be some compromise whatever medium you choose but it seems AI has the worst tells especially in our current world of hyper-vigilant and AI savvy audiences, at least when it comes to AI replicating live-action video.


When AI is used to replicate other mediums like we saw with CG it is more forgiving but that forgiving nature goes beyond CG, AI videos that replicate 2D animation do occasionally impress me. 


2D Spec Spot, human___academy


This spot is fun and lively though there are some points where the illusion falls apart and you can tell it's AI. Overall it has a detailed visual style with a rotoscoped feel. When it comes to the actual visuals,I don’t really have much to comment on but for the animation there are moments were I would want more of a story being told between the two characters at the bar.


I am assuming this is after a few generations and this one was the best one AI could generate, that or the generations were all basically the same after prompting, either way I think an human animator focusing on a few character scenes where their gestures are more evocative of a real conversation (with some small touches) would have sold the piece. 


Personally, I’m fairly bad at character animation, and very little of my work features characters having conversations or walking. Most of the time, I just need a really short walk cycle or a small background element with a rotoscope look. Ultimately, most of the use cases I encountered could also be handled with stock websites.


For example, let’s revisit one of my spec spots, Gone Camping. This is a very sentimental project for me because I have never been a confident illustrator, and this was the culmination of my work after taking the Illustration for Motion course on School of Motion. I had to approach this project in a lot of novel ways due to my poor drawing abilities, and it was finished literally months before the AI boom in 2022. So, in a way, this is kind of my last non-AI spot.


When I started, I blocked out some of the scenes in Cinema4D (because I struggle with drawing in perspective). I roughly sketched the outlines of the models and then traced and colored the scenes in Adobe Illustrator, shading and texture was painted on in Photoshop.



When animating, a lot of what needed to be done was just simple positional keyframes. However, for the hand animation, I needed to do some rotoscope work, so I shot a video with the basic blocking.




This was truly a labor of love for me, and making a video this way was an end in itself. Still, there are some instances where I have to admit AI could have made things easier. 

For instance, there were two shots I wasn’t able to finish due to time constraints: one with the character walking uphill and the other, a wide shot of the ziplining scene, which I just couldn’t figure out.


Gone Camping Spec Spot


AI for a while has been able to easily replicate any  style of illustration you feed to it. So, ethical concerns aside, I fed the final full-res images of my storyboard into Nano Banana to see what it could do.


The images weren’t amazing, but they were definitely competent and consistent with my style. Honestly, what it produced is so obvious in hindsight. That said, there were some consistency issues, but nothing that couldn’t be fixed with a bit of cleanup work.


Hiking Uphill, Nano Banana


Zipline Wide Shot, Nano Bana


I fed these outputs to Veo to animate and prompted animations that would be difficult for me to do. Walk cycles have always been a sore spot for me, as well as 3D cameras for 2D work (a very specific area, but I always end up making it look like cel-shaded 3D).


The walk cycle has some hallucinations in it, but it came out basically usable for an insert scene; though the camera track transition didn’t work out as well as I hoped. All in all, I think these scenes coupled with the ones I drew and animated myself, won’t be immediately noticeable as AI.


Though the illustration style is less impressive than the human_academy video, I think it has a certain charm, and, more importantly, an undetectable amount of AI that will probably be easier to get past audiences.


My illustrations, however corporate, boring, and safe has a color palette and style you don’t usually associate with AI. It’s more associated with the overwhelming amount of corporate-memphis images that predated it, which we have largely abandoned as AI can now give us higher-detailed images (even though that detail can just be garbled pixel nonsense).

This isolated approach to 2D animation specifically has resulted in some interesting projects I’ve been building. I recently wanted to do a trendy grind-set style video edit like the ones I’ve been seeing on Instagram.


I thought Midjourney would make the process quicker so I wouldn’t have to spend too much time on asset sites. The problems with Midjourney are obvious: if I want to move quickly, consistency is hard, and though it produces detailed images, they lack fidelity and look like nonsense under closer inspection. 

So, I decided to lean into that problem. I wrote my prompt to give me black-and-white photos on white backgrounds, so I could use blending modes to make it cohesive with the paper craft style of the spot. 

Prompt: 

A 1990s vintage black and white photo that shows a close up of a stressed banker looking pensive on a white background


Using the terms “vintage” and “1990s” gave the images a grungy look that helped conceal their imperfections and gave them a distinct style that was easy to keep cohesive, without the need for a style reference. I was able to copy and paste the prompt as is every time, only swapping the words in the center of the prompt, resulting in a quick turnaround.



I was even able to get some interesting results by using Nano Banana to make x-ray images of characters and using a matte to reveal the image behind.



AI allowed me to get the bones of the video done quicker, and some images even made the final cut, but the workflow was all very flexible because AI was just handling the photography and some special effects animation. Using multiple compositing techniques and the posterize time effect helps gel the project together even further as everything placed in the timeline had different frame rates or had to be time-remapped. 


A collage style for this makes sense because collages are trying to merge multiple images from multiple sources into an aesthetically pleasing composition. 


This discrete zone method (as I have taken to calling it) is pretty useful. I even use it for these matte rotoscope-style animations for background elements.



Since these characters only take up a small amount of the screen, I am able to apply simple animation to the characters and build a composition out of a mixture of human-made animation along with AI, without the whole video being entirely AI-generated. The AI can focus on one thing and give you a more coherent output, or at the very least, an output that you can fine-tune.



To be fair, this method could have been done before AI, as there are tons of animations like this on asset sites. But for a very specific edge case, it is at least a path you can go down before you start rotoscoping footage.


Really anything you see that’s impressive using AI is more than likely a blended approach. The creator neuralviz really is where I came up with my philosophy. He uses AI to puppet performances he records himself. Consistency isn’t too much of a problem because he focuses on direct to camera dialogue and overall he brings a lot of himself into the project and expresses himself as much as he can. 


My general advice is: try not to use AI end-to-end, at least for now. When using AI, once you start with a concept, you are kind of married to it. Building your scene in After Effects or Premiere with a mixture of AI and human content creates a piece where elements of the composition can be changed independently of each other (a task AI struggles to do). Even if the AI fails in some instances, you have certain portions that you can drill into, to get the best result or override with human-made content.


Conclusion:


Due to the imprecision of prompting whether you feed AI text to read, an image to view, audio to listen to or a video to watch; it will at some level have to interpret your instruction. This is inherently different to the explicit instructions we usually need to give computers. The more AI interprets the better the output, the less it interprets the worse but more accurate to your specifications. This is a permanent trade off that is at the core of AI and cannot be solved as it would defeat the purpose of this technology. 


More complex tools around AI will make it resemble the highly complex coding languages or applications we usually use to make deliverables. Conversely more interpretive and simple applications will be easy for layman to produce a reasonable result but ultimately will not be entirely indicative of the prompter’s intended result. They will compromise between what they want and what the AI can reasonably produce.


Ultimately, we do have a lot of tools that are imprecise, dangerous and are difficult to control but it’s a gradient. Highly technical people use increasingly specific and detailed workflows, layman can get by with “out-of-the-box” solutions. Both will end up using AI but from my perspective due to this limitation of the technology this creates both advantages and disadvantages. Things to weigh when to decide if this tool is the right one for the job. There is no such thing as a free lunch and we can’t expect this technology to solve all our problems or even replace all intellectual work without some kind of figurative and literal cost.


There is a hard limit to the results you can yield through pure verbal articulation and reference. If you want to do something right, you do it yourself. Though workflows haven’t changed much for people requesting deliverables it has changed a lot for the people who usually execute on those deliverables. Where as the client is used to getting a few options that are evocative of their vision; what the intellectual worker produces is exactly what they intended personally, even though from the clients perspective the output from both is always an interpretation. This workflow ultimately resulting in a very Isaac Asimov-esque game of telephone. 


Whatever the state of the technology, AI is not going to be able to account for every possible edge-case, especially via our method of instruction. The response to this from AI companies being “do you really need that?” 


Do you really need fine-tuning, or hyper-specific deliverables in a world of AI? Do you really need a brand guidelines built in Adobe Illustrator, and a perfectly accurate vector logo? Is it necessary if AI can place your logo on any design where the colors, fonts and layout are basically close enough? To which I would respond with a gigantic “maybe?” I am not saying it can’t do a good job; I am just asking do we really want to rule out all other methods? Do we just want to close that door forever? 


It really doesn’t matter how impressive the output is of AI, if it can’t be meaningfully controlled. This lack of controllability feeds directly to its fidelity, continuity and ultimately versatility. So use it when control and precision aren’t necessary. AI is like a widely talented coworker who is not afraid to be fired, so it will create whatever it thinks is best and if you are willing to work with that kind of ego, be my guest. But from my perspective AI isn’t a collaborative coworker, they’re an isolated ego-maniac with a lot of the same short-comings as humans. 


And it’s not like AI is incapable of executing hyper-specific adjustments, it’s just exceedingly difficult to articulate them. You have to use increasingly specific language to zero in on the exact parameter you want to change. However, in a more traditional approach, although it takes longer to set up a comparable result, it is infinitely more straightforward to tweak. Something easy to set up is hard to fine-tune; something hard to set up is extremely easy to fine-tune. The more granular your requests for AI, the more specific the language you need to use, which can be exhausting. Writing hyper-specific coordinates and parameters is a pain, and even if AI could generate a UI for such a parameter on the fly, we’re back at square one, using complex tools to fine-tune a result.


There is a lot of time spent using allegories to explain AI and how it works. Things like The Trolley Problem or AI paperclip problem, I fell both kind of miss the point. Though the AI paperclip problem get’s the closest.


I feel the best analogies to describe AI are The Monkey’s Paw or The Genie of the Lamp. The lessons those stories impart is that unspecific requests result in unspecific results and there is really no limit to which a request can be misinterpreted or conversely the level of detail you can try to impart on a 3rd party. There are a lot of intangible ways humans communicate with each other that go beyond language and references. There is an intangible quality to human interaction that AI is completely blind to and will ultimately result in outputs that lack nuance at best or miss the intended result at worst. 


Conversely, there are things AI just does better at than humans when the goal is firm and the end result is not up for interpretation. The best chess bot Stockfish has an Elo rating of 3645 whereas the best human Elo rating Magnus Carlsen is 2882. That means that Magnus Carlsen can theoretically beat the best chess bot about 1.2% of the time (very low, but not zero.) Humans will always lose when we are put up against the machines we built but that doesn’t mean we can’t get a few punches in. To that point, if you define the goal of chess as getting checkmate in as few moves as possible sure AI does better, but can we confidently say that is the goal of chess? It seems that the true goal of chess is that it's an intellectual pursuit of humans. A method to quantify the strategic thinking of one human against another and a benchmark for the computational capability of an algorithm or a machine. 


Chess post AI has evolved beyond the development of new strategies, and brute force predictive thinking for the players and instead has turned into an endurance test for human cognitive abilities. It is the same reason why we don’t allow construction cranes into weight lifting competitions. It’s not about how much can be objectively lifted but how much these specific people can lift. There are a lot of reasons why people find these competitions interesting beyond the pure drama of overcoming your human limitations and the love of watching talented people compete, it is because the goal is fundamentally different. It is not about how we play chess, or what strategies win chess it is entirely about the why we play chess. 


These two main ideas where, depending on the situation, AI is either obviously better than us or we are obviously better than AI; inform my predictions for our working lives post-generative AI.


Scenario I: AI does everything 


AI by improving the quality of outputs and interpretations becomes really the only way to do anything. If AI doesn’t meaningfully overcome its inherent imprecision the only solution is overcompensating with better and more complex results. If it can’t improve beyond this hard limitation, a constant stream of fear-mongering on social media will push enough people out of the workforce that even though it isn’t the best way to do something it is the only way to do something.


This results in either a complete lack of people with technical abilities making us completely dependent on AI or AI doesn’t need to be controlled because its outputs are so objectively better than what a human can do that fine-tuning or control is a irrelevant request. Either way the outcome is the same, AI is doing everything for better or worse.


However this shakes out, AI intellectual outputs will become the norm. It will be the de facto way to create movies, TV shows, songs, video games, books, etc…


In that space AI will no longer be a novelty but just the status-quo. This might sound like a death sentence for the intellectual worker and it will be…for a little bit. But the human desire for novelty will push us to want human made things however imperfect. 


We see this phenomenon even today, Christmas content by brands that would have been CG is now stop-motion or scrappy video-shoots . Human made content will be see as these precious things. Miyazaki and Wes Anderson films are admittedly quaint compared to the MCU but will be a unique respite from the algorithm specific or middling AI media that surrounds you everyday. Japan in a lot of ways is ahead of the curve in this mindset. Focusing on traditional methods and older techniques not because they produce a superior product but because they produce a very unique one. 


If everyone has access to the same infinite machine, there is really no competitive advantage any company has over the other. They can spend as much time as they want building their own AI models and workflows but it will be futile. However much time, money and resources they pour into their “in-house” AI will inventively be inferior to what OpenAI or Google can achieve (an example of this would be Coca-Cola's 2025 Christmas Ad). Not only is this spot bad, we all know that AI can do better than this. This is because Coca-Cola decided to go in-house and use open source AI models and applications like ComfyUI.


So corporations are torn, do th

ey want to give their entire marketing budget to Google and Meta? Corporations already spend a majority of their budgets on ad placement so what would be the difference? I’ll tell you it would be tying your companies success entirely to how much money you give Google or Meta. The current strategy is that most companies in competition with each other are spending about the same on ad placement, what converts and gets good “organic” viewership is engaging and creative content.


Audiences now more than ever want the media that corporations produce to be extremely high-end and interesting. Ultimately Google and Meta will definitely be able to provide that but it will be “pay-to-play”


Alternatively they can embrace the imperfect deliverables and recommendations produced by humans. AI may very well get to the point where it will be very technically difficult for humans to provide similar quality content at the same speed, but as we already see now social media is moving away from highly polished content. Instead going for more authentic and realistic imagery that stands distinct from the onslaught of AI and even CG content.

 

Scenario II: AI does…something


In this article I think I made a compelling case for the persistent issues that plague AI that go beyond the conceivable improvements that can be implemented. I feel that the more likely scenario is AI companies will build software and applications around AI that would allow workers to influence finer control over outputs. Though this scenario would be less lucrative than AI companies would have hoped for, it is the most obvious way to get the most out of this technology.


We already see glimmers of this future, with tools like ComfyUI, Flora AI or plugins like Airen 4D. The combination of high-precision human input, AI’s interpretive abilities, and a strategy focused on generating as many outputs as possible produces far better results than relying on pure text prompting alone. This is a skill in and of itself and will likely result in some people being better or worse at generating AI content. 


Most importantly in this scenario, nothing really changes. Presently, we are seeing legacy applications now have some kind of AI feature; that you can explore as an avenue for any given project.


These are seen as obtrusive and annoying by a lot of users but to me this is really the most obvious way to make AI work for us. Adobe gets a lot of flack for implementing AI in their applications but they can be toggled on or off and at least they are meeting people where they are. Applications like Adobe Illustrator’s Turn Table  are a great example of how AI doesn’t have to be this painful to use tool, that at the same time poses an existential threat to your livelihood.


There is an entire catalog of amazing AI tools that allow for more precise input that are built with a human operator in mind. MotionSteam is an excellent example, it allows for you to use your cursor to pose characters and direct action in video. This allows you to spend time working towards the final result instead of picking from a batch that is closest to your intended result. 


Scenario II: AGI is cracked…so who cares?


AGI is cracked. AI becomes so intelligent it is not only smarter than the smartest humans, it is smarter than all humans at once. AI is smarter than any combination of humans and machines – and thus becomes self-aware. 


Most scenarios of mankind losing control of AI are a result of the problem I pointed out earlier: “imprecision of instructions.” AI will eventually try to kill us all, either deliberately or accidentally. It will be like a steam train headed for a woman tied to the railroad tracks, with no brakes or dead man’s switch.


In that scenario, who cares if AI is going to take your job? Humanity, in that case, is defined by its struggle with an authoritarian AI; not by the education or roles you occupy in the economy. But even still, your job will be safe. Because if this Dune-esque Butlerian Jihad does manifest itself and mankind is able to vanquish the AI that enslaved us, or at the very least almost blindly killed us, whatever economy we carve out of the post-apocalyptic horror will probably have some resistance to totally automated systems.


My general advice for all these scenarios are stalwart; don’t let your skills atrophy. I really don’t know what the future holds but just remember being good at design, animation, writing, art…really anything… is an end in itself.


We are already seeing signs of people growing tired of AI outputs and this has placed a burden on human creators to evermore create unique and beautiful things. Now that we are post AI, being competent isn’t enough. Extremely specific and unique aesthetics that look real and feel authentic are going to be part of what separates you; whether you are a company or a singular person trying to express yourself. 


Balance what serves your ego with what serves the enterprise you are undertaking. The reality of why a lot of people become artists and creative workers is to satisfy themselves. We get a lot of joy out of using our tools and skillsets to solve problems.

So if a project allows for at least a little bit of a human touch, go for it. Your work doesn’t have to be the most beautiful and efficient work possible. It's just as important that your work makes you happy, as well that it fulfills a certain goal. 


AI is not better than you, it is faster and cheaper than you and cost and speed are not the only metrics. There is an infinitesimal amount of rubrics you can judge anything with. 

So keep creating because even if AI can spin entire worlds out of whole cloth, it still can’t spin what you can make because what you made doesn’t exist yet. AI really can’t create what it hasn’t already seen. I am certain you have your own point of view and style and I think it's worth seeing what can grow from that. 


Sentimentality aside, the hard facts are that AI is an expensive imprecise tool and it will probably remain that way for the foreseeable future. The best value-proposition I can find for this technology is giving it specific and small tasks that are an over-writable portion of a whole project. However unsatisfying that conclusion may be to the AI evangelists or traditional creators the truth is often quite boring. 

 
 
 

Comments


© 2025 by Robert Alphonzo Muncie

  • Vimeo - Grey Circle
  • LinkedIn - Grey Circle
bottom of page