(The Better Part of) A Year in Generative AI

Disclosure: All opinions expressed in this article are my own, and represent no one but myself and not those of my current or any previous employers.

I've spent the better part of this year kicking the proverbial tires of AI. I'm surely the first person to post about it, if you can ignore the nineteen bajillion other posts.

I want to talk a little about what I've done, and a bit about what I think is coming. The first part of that will probably be underwhelming, and second part will probably just be wrong. With expectations set accordingly low, I'll get right into it.

Somewhere near the beginning of 2023, I was put into contact with a really impressive woman who had an idea about how she might be able to leverage technology to scale her job. I'm going to stay intentionally vague here, because she's nearing the launch of her product, and it's her story to tell. What I can say, is that it became extremely apparent early on that generative AI was going to be a massive component in what that solution would be. So, I started down the generative AI trail.

At that point, we were all just hearing about ChatGPT. A handful of lucky folks (myself included) were getting our hands on it early, signing up to pay to trial it, and tricking it into being mean and dangerously wrong. What struck me very early on was just how fast the landscape was changing. It seemed like code I was cutting in a week was constantly outdated the next. Frankly, I didn't know the difference between GPT-3 or BERT or RoBERTa or any other LLM. I felt a little like some weird cross between an old tech dinosaur who was trying to understand emerging technologies through patterns of all the things I've learned over nearly three decades in technology and a baby discovering the world for the first time.

dinosaur baby

I feel like we should start with a bit of a differentiation between "AI" and "Generative AI", and it feels only fitting that we should have the latter do that for us:

Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI techniques enable computers to perform tasks that typically require human intelligence, such as understanding natural language, recognizing patterns, and making decisions.

Generative AI, on the other hand, is a subset of AI that focuses on the creation of content rather than just processing existing data or making decisions based on it. Generative AI systems are capable of producing new, original content, such as images, text, music, and even videos, that mimic the style and characteristics of human-generated content. These systems often use techniques such as deep learning, neural networks, and probabilistic models to generate content autonomously, without direct human input.

Personally, I think the differentiation is much ado about nothing. Not because the differences aren't really important (they are), but rather that it's okay to say "AI" and mean "generative AI", but maybe that's just me. I've read plenty of blogs by folks eager to piss and moan about the differences, and while they are important, what is more important to me is that when we say some words that the audience knows what we're talking about. Does anyone remember when "Big Data" was the topic of every conversation? By definition, that "refers to data sets that are too large or complex to be dealt with by traditional data-processing application software", and since "traditional data-processing application software" quickly caught up, by definition that meant that practically nobody really had big data. But we did still use that term. Also, for transparency, I pissed and moaned about that one, so I'll shut up now.

Anywho...one more thing to call out here: while I talk about "emerging technologies", I'm not being accurate, necessarily. Neural networks, which are at the base of LLM training, have been around for a long time. As has natural language processing (NLP) and the entire buffet of technologies, concepts, and techniques that go into generative AI. But when the processors caught up and the ecosystem started to "fill out" with supporting frameworks and libraries that enabled us to leverage these existing technologies/concepts/techniques, that's what I'm talking about. So, yes, nothing new here technically, but from an adoption standpoint, and bringing it to the masses, it's very new.

So it's early 2023, and I'm stumbling around, feeling like I'm one of the blind men trying to identify the elephant. I am sure this is a well known parable, but just in case, it's a story illustrating the concept of subjective truth and the limitations of individual perspectives. In the parable, a group of blind men encounter an elephant for the first time and attempt to understand what it is like by touching different parts of its body. Each man perceives the elephant differently based on the specific body part they touch: one describes it as a tree trunk (after feeling its leg), another as a fan (after touching its ear), and so on. The parable highlights how each person's limited perspective shapes their interpretation of reality, and emphasizes the importance of considering multiple viewpoints to gain a more complete understanding of complex phenomena. I have no idea why the generative AI keeps blindfolding the elephant. The technology is far from perfect.

parable of the blind men and the elephant

At that point, it was really around prompt engineering for me. Early on, I was learning as much as I could about that, and it's really fascinating. Prompt engineering is the process of designing and refining prompts or queries in these NLP type systems to elicit desired responses or actions from the underlying models. It involves crafting prompts that are clear, concise, and specific enough to guide users in providing relevant input or to guide models in generating accurate outputs, and is a lot of trial-and-error. Additionally, this idea of hallucinations comes into play, which is basically when the models generate outputs that are not based on actual data but rather are a result of the model's internal patterns and biases. The result is occasionally producing responses that are nonsensical, inaccurate, or even inappropriate. They occur when the model generates text that appears coherent but lacks factual accuracy or logical consistency. And out-of-the-box (if you will), you have very little control over just how much of that you can allow, though you can use certain parameters, like "temperature" to help control it, somewhat.

The temperature setting is a parameter used during text generation to control the level of randomness or creativity in the model's outputs. By adjusting it, you can influence (not control!) the diversity and unpredictability of the generated text. Lower temperature values result in more conservative outputs, where the model tends to produce more common and plausible responses based on its learned patterns. Conversely, higher temperature values increase the likelihood of the model generating novel and diverse outputs, which may include more unconventional or imaginative responses. So, clearly we should think about the "temperature" setting the same way we think about alcohol when playing pool. You need just enough of it to be good, but too much and you're horrible.

drunk playing pool

I'm not really sure that comparison makes sense, but I think the image is funny, and somehow the stick goes through his head and out his mouth (!) so it's staying.

For this weekend-work PoC initiative that I was working on, I decided that it would be really interesting to have multiple, competing AIs. Without divulging any details of the app, a small handful of AIs were all given the same starting information, and then the user could have independent "relationships" (yikes) with them, and anything you told one would remain between the AI and the user. I thought that was crazy novel, and I could have injured myself by walking through doors because my head was so big. Unfortunately, that's not all that novel. But still, I found it interesting, so that's something, I guess.

And, while I was being reminded that I'm not special, I also decided it best to get some learnin' in on this fascinating and emerging subject, because I kept hearing terms like "Transformer architecture", and I had no clue what that was. That led me to the paper on the subject, which is crazy interesting, but not exactly something you grab if you're a bit drowsy, if you catch my drift.

It turns out, the Transformer architecture is revolutionary, in that it uses a self-attention mechanism to capture long-range dependencies in sequences. Unlike recurrent neural networks (RNNs) and convolutional neural networks (CNNs), Transformers process input sequences in parallel rather than sequentially, making them highly efficient for both training and inference. So, in a nutshell, and this is going to get super academic for a second: it's fast AF and it be good-er.

I hope I didn't lose anyone with that incredibly scientific final sentence. From there, I took the online course Generative AI with Large Language Models, and followed it up with another online course called Fine Tuning Large Language Models. And then, in my quest to develop a pointy head, I read (admittedly, some more thoroughly than others) the following academic papers on the subject:

And if I were to proof-read this before posting it, I bet I would feel like a super nerd, so I'm not going to proof-read it. Problem solved.

After reading words by smart people and feeling dumber for it, and not accomplishing my goal of developing a pointy head, I did recognise just how little I really knew, but also just how much more there was to this. And that whole fine tuning thing started to get really interesting...

Fine-tuning is like taking a really smart but generic robot and teaching it some new tricks to make it even smarter for a specific job. Maybe you have a robot that knows a lot about different things, like animals, plants, and vehicles. It's been trained on a wide range of information. Now, if you want the robot to specialize in identifying beer in refrigerators, you can fine-tune it by giving it lots of pictures of beer and telling it which ones are correct and which ones are just soda. The robot adjusts its settings based on these examples, kind of like tweaking its programming. This way, it becomes really good at recognizing beer, using what it already knows to get even better at this specific task. It's like giving the robot a mini upgrade to make it an expert in beer spotting while keeping its general knowledge intact.

Homer, beer

That led to me vector databases, which, I have to admit, I fell in love in with, for a few reasons. First, it's a big technology unlock for fine tuning.

Vector databases store embeddings, or numerical representations of data points, generated by pre-trained models or during the fine-tuning process. It's the tokenisation of an input, whether that be an image or audio or video or text. These embeddings capture the semantic meaning and relationships between data points in a high-dimensional vector space, allowing for efficient similarity search and retrieval. During fine-tuning, vector databases enable you to compare the embeddings of new data points with those of the existing dataset to identify similar instances or outliers. These similarities that you receive can then be passed on to the generative AI model to create responses that are more in-line with what you want (which is to say, the specialisation. In my amazing earlier example: beer).

The other reason that I fell in love was what I emphasised above: the semantic meaning and relationships between data points in a high-dimensional vector space. Why did that cause a happy heart palpitation (is that a thing)? Because it meant that the outputs compared to traditional Lucene indexes could be improved on! Being able to derive semantic meaning and relationships between data points in a high-dimensional vector space is superior to Lucene indexes and simple keyword matches because it captures the nuances of language and context more effectively. Lucene indexes and keyword matches rely primarily on exact word matches or predefined rules, whereas vector representations encode semantic similarities and differences between words and documents based on their context and usage. So I could use these little sweethearts to allow for more nuanced understanding of similarity and relevance, even if they don't share exact keywords or phrases.

Now, to be fair, vector databases are absolutely, positively not new, and they're not a LLM thing. But they were new to me, and if it means that ElasticSearch doesn't have to be a part of my life, I think that's a good thing (amazing technology, but what a clusterfuck. Not only have I had to stand up and maintain clusters for ElasticSearch, there was that whole AWS forking situation...).

Back to the point here...so now I've learned enough to be dangerous with prompt engineering, including understanding token limitations (and what tokens even are!) and tricks to get it work better, and I am shaking my head that "Prompt Engineer" starts to pop up as a real job, and I'm visibly shaken that it's a job in technology, and I'm learning about fine-tuning and I am falling in love with vector databases, and learning once again that databases don't really love you back, which is not dissimilar to Lisa Bonet, who I loved when she was on that rapist's show, and she also didn't love me back. Maybe it's me.

Lisa Bonet

All right, I have to keep moving things along. So then I started to build out architectures using some of these technologies. Nothing too special, but starting to tie together some of these concepts, and layer in the stuff that I knew from my other life as a normal technology person. I wanted to go serverless, because...me...and that was kinda different because the (1) the vector database wasn't cloud native (certainly wasn't AWS, and AWS has been dangling Bedrock, but it wasn't generally available, and seemed to be trying to use a graph database in place of a vector database, which I'm not sure about, but also because AWS has a bad habit of releasing things early and then waiting for their customers to frustrate the hell out of themselves trying to productionalise shit that isn't anywhere near production-ready. Yeah, I'm looking at you, Sagemaker), and (2) this whole token limitation thing. If I was creating conversations with multiple AIs, they had to be aware of the rest of the conversation, but the rest of the conversation led to too many tokens, so...damn you, world!

Angry old man

Then it struck me: I needed to persist the historic conversation (obviously) and I had to summarise it, and have it act more like the human brain. Obviously, we remember things that happened five minutes ago far more clearly than things that happened five years ago, and I was going to replicate that by taking historic conversation and use AI to summarise that, then feed it back in along with the granular recent parts of the conversations via prompt engineering, and layering in the fine tuning and then I would be a king and people would talk about me like I was the smartest human ever and they'd just throw money at me and buy me stuff.

Except that also wasn't novel. At all. And I wasn't all that smart. Again. Damn you, world. At least I have my youth...oh, wait. Shit.

Turns out this was all being done already, too. By now, I'd discovered Lamini, as well as Hugging Face and LangChain, and that ecosystem was really starting to fill in. And fill in fast, in addition to hundreds or thousands of other companies. I talked to a VC friend of mine who noted that "the only thing getting venture this year is AI", so the idea that there wasn't money to be had wasn't correct, per se. Just that where there was money to be had, it was all going into one emerging sector. These frameworks were taking all these brilliant ideas of mine and somehow going back in time and coming up with them long before me, and then building frameworks to abstract the complexities away. And that is probably the right place to transition to what I imagine the upcoming year holding, so I'll do that now.

Next Year

This might come as a shock, but I can't see the future. If I could, I might have realised that I wasn't the one coming up with disparate AIs talking to one another or having some level of shared information and then branching off independently. I might have foreseen that I wasn't the one coming up with summarisation of historic information that better represented how the human mind works in order to address token limitations (etc.). I certainly would have invested big in bitcoin, and I would have bet on the Minnesota Twins to win the World Series in 1987.

1987 Minnesota Twins

That reference definitely doesn't date me at all. Nooooo. Not at all.

But I'll take a really quick shot at what I see coming and why. The world is not in agreement about whether we're currently in a recession or not, or if we're on the brink of one or not. I'm sure as shit not going to know. But if there is question, it is probably reasonable to guess that the next year is going to have a lot of focus on cost cutting and saving money. While I don't think AI is going to come for your job in 2024, I think what AI can do now - and the pace at which it is moving - is enough to imagine that it is going to play a role in those cost savings. This probably manifests itself as "helpers" of some sort to do more things that humans are currently doing. Introducing those ways for generative AI to take the load off of your plate helps you to focus more time on doing other work. That, in turn, means companies will be able to hire less people (or more accurately, get more output out of existing staff).

Hiring, then, will focus more on AI, which will get hotter, since we need to hire the people to build the stuff that existing staff uses to make them more productive and in turn allows their companies not to add more staff in those areas.

I think these various frameworks, like LangChain and Hugging Face, will thrive. I also think that a lot of competitors won't, because while there may be thousands of them right now, I think some of them will emerge as the industry leaders, and it will squeeze out some of the little fellas. This one is just how a free market works. The best will get more money and that will create a larger moat and make it harder to compete with them. Does it mean that they'll end up being alone or that it'll be a handful of big dogs and nobody else? No. But I think it will mean that it will be harder to compete, right up until they get too big, and then they won't be able to move fast enough, and then the overly fat incumbents will get passed by. So, I think 2024 is going to be the emergence of who those incumbents are going to be, and they'll (rightfully) own a massive market share because they'll be far and away the best. The same will happen for vector databases, and I am going to go out on a limb here and make a bold prediction, which is that AWS will purchase Pinecone.io because their use of Titan isn't going to be good enough.

I also think we're going to see the other really cool shit that AI can do that isn't chat bots. Right now, it seems like everyone is using ChatGPT to build a better bot. But I think more use cases, specifically around how we can use generative AI to have functionalities that make existing jobs more productive will emerge, and they're not going to include chat. I'm not sure what that is going to look like, but I think that's a trend that is going to emerge and be huge. And it lends itself to the earlier comment about cost cutting next year. And, it's not going to be all GPT-*, which will remain the leader, but some of the other base LLM models will be increasingly more prevalent, as folks realise that they can accomplish the same thing (often with open source alternatives).

Finally, hopefully, we'll stop talking about training LLMs and we'll rid ourselves of this notion that your company is going to have a LLM. To be crystal clear on this one: it is RIDICULOUSLY expensive and compute intensive to actually build a LLM from scratch. That's why companies like Google and Microsoft and OpenAI are doing it and you're not. Yes, you can fine tune. But that's different, and it's not the base model. So I'm predicting that the general public will (hopefully!) stop mis-using terms and better understand how these things work. And that brings me to my final prediction:

I will continue to come up with brilliant ideas, which are all my own, and which nobody has ever thought of. And I will spend hours and hours appreciating my own brilliance...only to find out that they're not my ideas, that someone else already came up with them long before I did, that their solutions are far superior to my own, and that I'd be far better off being a little more humble. Especially when it comes to shit I don't really know about, like AI. Or generative AI. Or, really anything.

Thanks for reading.

Nobody asked me, but...

(The Better Part of) A Year in Generative AI

Next Year

Joel Lieser

Recent(ish) Posts

The Case Against Cloud Agnosticism

The Little Kings Gambit

Cloud Native Architecture Principles

Tags