Cover photo for Philip I. Thomas

OpenAI, the path for OpenAI-powered startups, and the AI hype cycle

Philip I. Thomas
The rise of OpenAI has kicked off a new trend in technology: artificial intelligence startups. New AI startups launch every few hours, powered by OpenAI's different services for image generation, text generation, and even code generation. With Microsoft's recent $10 billion investment into OpenAI, we are at the precipice of a new hype cycle in technology.

The market has greeted Artificial Intelligence with skepticism after recent trends like VR and Crypto haven't yet delivered on their world-changing promises. Market skepticism centers on one truth: Most new AI startups have developed no proprietary AI and rely entirely on OpenAI for basic functionality.

I've spent the last six months helping startups develop AI products through The Contraption Company, including in-house AI systems and OpenAI-powered apps. Through this work, I've formed three perspectives that run counter to the market:
  1. Better AI systems may soon overshadow OpenAI
  2. OpenAI usage can lead to proprietary intellectual property
  3. The AI hype cycle is substantive

OpenAI alternatives - but you can't use them yet

Machine learning (aka "AI") isn't new, and people interact with mature versions every day - from ads to newsfeeds to product recommendations. Many machine learning experts down the accomplishments of OpenAI because its systems seem like a commodity. Their sentiment appears to be, "with enough data and computers, it's possible to rebuild this."

The way OpenAI's systems are not exceptionally secret - many of the techniques are public. Their founding team had the pedigree to raise enough money for a moonshot idea. And, OpenAI had the risk tolerance for being the first in the market to give external developers broad access to their technologies.

Having enough data and computation to run AI is a barrier to startup AI companies. At the core is a chicken and egg problem: startups need data to train AI, and new companies typically need more data. These challenges have siloed cutting-edge machine learning to large corporations with enough data and talent to develop useful models. Google has long led the AI space - they have had substantive AI efforts for over a decade and routinely leverage AI in everything ranging from language translation to data center cooling.

Google has a ChatGPT-style tool internally called LaMDA, but has avoided publishing it externally because of the brand risk an AI platform creates. No big tech company wants to repeat Microsoft's 2016 AI launch, which quickly ended with the front-page headline "Twitter taught Microsoft's AI chatbot to be a racist asshole in less than a day." AI acts in unexpected ways, and mature businesses prefer to avoid risk.

However, OpenAI's traction seems to have compelled Google to rethink its AI services strategy. Google's forthcoming Bard announcement will likely be a competitor to OpenAI. The relative utility of ChatGPT versus Bard may be close initially. But Google has one significant advantage over OpenAI: hardware. Google's AI teams have been custom-building AI-optimized computer chips for their data centers since 2015. Proprietary Tensor Processor Units give Google a massive cost advantage over OpenAI that may help them scale and differentiate. 

Google's forthcoming AI product should become a viable competitor to OpenAI and may even out-compete it. Over time, additional OpenAI competitors may emerge from companies with massive, propriety data sets - including Amazon and Facebook. OpenAI has a first-mover advantage, but its traction will drive competitors to enter the AI market.

Generative AI solves a startup chicken-and-egg problem

Machine learning is a broad field covering many different types of problems. AI services already exist - such as Amazon's Recommender service or Google's Vision AI service. OpenAI's success is primarily due to its strategic focus on one machine learning area: generative models. Generative AI models can take a few words as inputs and synthesize high-fidelity outputs such as photos, essays, or functional code. Creating an answer with limited input data short-circuits the chicken-and-egg problem facing startups - thus enabling a new generation of startups to bootstrap AI products without proprietary data.

Generative models have a problem: accuracy. OpenAI models confidently return an incorrect answer, and OpenAI doesn't include confidence intervals signaling whether it thinks its response is adequate.

OpenAI's models do a mediocre job of solving a broad set of problems. Its accuracy is typically high enough for initial prototypes with controlled inputs. But, as customer usage grows and inputs become less predictable - startups will usually see accuracy decline over time. Measuring accuracy is an essential step for most startups - this is why you'll see thumbs-up/thumbs-down buttons in most AI apps so that users can label the response as good or bad.

Over time, startups build proprietary techniques to make OpenAI work better for them - often consisting of prompt engineering, vector embeddings, model tuning, and a collection of heuristics. This intellectual property typically helps startups scale their AI from a proof-of-concept to a stable product with customers. Customers flagging incorrect answers form a feedback loop that drives accuracy measurements and improvements over time.

As usage grows, companies will start to see speed become the limiting tradeoff in model accuracy. Every query to OpenAI can take a couple of seconds to process, and that bottleneck impacts the customer experience. Overcoming the accuracy/speed tradeoff typically leads to replacing OpenAI with proprietary AI systems.

As an application stores a log of customer queries, AI responses, and the customer labels of "good"/"bad" on every answer, this data can form the basis for a reinforcement learning AI model. Entirely replacing OpenAI may be a long process. Still, in-house AI systems have the advantage of being faster and more tractable in terms of accuracy. 

Most OpenAI-powered startups will not get to the point of training their models. But, before OpenAI, training a custom model was the only option for most startups. Starting with AI helps startups get a prototype sooner and bootstrap the data ecosystem toward the same custom models - but in less time. 

Over time, startups that train custom models may even publish an API - competing directly with OpenAI for a particular application. Over time, a library of more specialized OpenAI alternatives will inevitably become available. 

How the AI hype cycle could play out

To recap, this is my prediction for the trajectory of most AI startups:
  1. Founders identify an attractive, niche problem that would benefit from an AI 
  2. Startup prototypes a functional product with OpenAI
  3. Startup develops IP to improve accuracy as usage scales
  4. A proprietary data set of inputs, outputs, and customer feedback becomes the basis for internal ML models
  5. Start selling access to their own specialized AI models via an OpenAI-style API.\\
In the past, it took millions of dollars and years of work before a startup could sell its AI products to customers. Investors recognized significant market and implementation risks in funding these early AI startups, so investments were rare. Investor appetite for AI startups flipped last year because OpenAI's generative models enabled startups to prototype products and get customers without massive funding. A decrease in market and implementation risks explains why so many venture capitalists started investing in OpenAI-powered startups - kicking off the AI hype cycle.

Historically, technology infrastructure has had high upfront costs - building an app, a data center, a FAB, or an AI model. Many technological innovations enable customers to trade these upfront costs for marginal costs. This model explains the success of low-code tools, AWS, TSMC, and now OpenAI. An AI startup following the above model can deliver a product to customers sooner and cheaper while eventually converging to the same proprietary AI they would have built without OpenAI. Along the way, OpenAI has unlocked an entire market of potential customers who never would have considered developing their own AI - such as small businesses. 

Decreasing the upfront costs for artificial intelligence has been the core innovation of OpenAI. Founders can now prototype products without massive upfront investments, and this lower upfront cost means faster iteration and, ultimately, more innovation. Unlocking AI as a tool to solve customer problems is why this AI hype cycle has substance and will end in many massive, innovative companies.