Playing with AI text generators.

This blog entry will be short. Very short.

Addiction

First what I have to say, it is addictive. Really. The AI are as close to what you can get to a smart, living NPC (Non-Player-Character) as possible today. I did discover the https://perchance.org/ai-story-generator and, well, got a bit heals over head about it.

At first it was amazing. Then, after some time it started to be a bit disappointing. And at last, I finally got the feeling that AI is dumb.

Really dumb.

AI is a marketing name

In reality it isn’t an “intelligence” it is a word mashing huge model of a human language. And if we say “huge” we do mean it. The one which is smart enough to sound reasonable uses 7*109 numbers to store it’s “model” of all the sentences it have seen. Those which are more capable are using from 13*109 up to 120*109 numbers.

This kind of intellect has nothing with reasoning. Zero. None, null. The reason behind that is very simple.

It looks like it is thinking, but in reality it is just finding most probable continuation of sequences of numbers it was show. And this is all.

The problem with AI it is that when guys from marketing have seen a machine able to answer the question stated in human language they thought it is “thinking”. Taking one misunderstanding to another they decided that “large language model” is not a well selling name and “Artificial Intelligence” will sell better.

At first it was. Now… well…

Why AI (text models) is doomed to be dumb?

Let us start with a digression.

Imagine you are sitting in a small room. There are no windows, nothing can get from the outside. Now imagine, that outside this box there is a civilization of creatures which have the sense of smell ten thousands better than dogs, they have amoeba like bodies and live in the ground sipping through gaps between sand.

Then, some day, one wall of your box do lit, showing you rows of black and white dots. On the opposite wall there appear a number of buttons. What will you do? Well, probably nothing. But if you will just sit, the box will start shocking you with painful electric shocks. So you will start pressing buttons. Some sequences will stop shocks, some will make box to give you ice cream (or whatever you find to be pleasurable), some will shock you almost to death.

Why I am talking such a nonsense?

Because this is how AI do “lives” like.

No common experience

The “large language model” is presented with “tokens” and a “token” is a just a number assigned to sequence of characters. This gives the first limitation. The “language model” will have a hell lot of problem with understanding what six-on-nine ( 69 ) means in terms of erotica. This idiom is hard to be considered reasonable until you can actually see the shape of number 6 and the shape of number 9. But to really guess the meaning, you need to be able to see how human looks like. Only then the circle in 6 and 9 can be imagined as a human head.

“Language model” is just shown the numbers, not shapes. It will have troubles with match words which do sound alike others too. In languages using phonetic alphabet the look of the word also carry the sound of it. In languages like Chinese the sound is not directly stored in a symbol, so in that language we can have words which sound alike but look differently and those which sound differently but look alike.

This shows basic problem – our written languages are bound with our hearing and sight.

Then if “language model” is unable to see nor hear, it will have hard time to deal with obvious things.

Lack of reasoning?

The AI, at least those I had a chance to experiment with, do quickly fall into a “most probable sentence” trap. Like for an example:

Martha has no sense of taste. Her taste buds were non-existing.

Martha was eating ice-cream. Her sister, as a joke, salted the ice-cream.

“How do you find them, Martha?” asked her sister.
AI continuation starts here:
Martha shrugged, her lips pursed around the spoon. “They’re… not bad,” she said, her words carefully enunciated. “They’re not as bad as I thought they’d be.” She took another bite, chewing slowly, and then swallowed. “They’re actually kind of salty,” she admitted, “but I can still taste the ice cream underneath.”

This is because the logic connecting “taste buds” with “salty” is located outside the language, inside our physical experience. Regardless how strong the intellect will be, unless it can actually touch that logic it won’t be ever able to reason correctly. There is however a strong probabilistic chain salted→eating→question about how do you find your food→tastes salty in most of written texts.

Alike, the so called “AI based code generation tools”, are also real pain in the behind. At best they are as good as “code template lookup engine”, which every company having own code base can create at zero cost with Omega-Search, Xapian, Lucene + Tika or just standard “template catalog”. And they won’t be ever better unless they will be able to use compilers, debuggers, GUI and other tools to actually test the code they write.

Cost

Once I got addicted a bit with perchance I decided to try to make it say really dirty stuff ;). Yes, I am male and males are, well…

Of course on-line services are out of question in such case. Privacy is important. Especially that in some countries saying some things is illegal.

I had to get myself a decent PC able to run AI locally. Gladly my old PC was getting close to end of life status (15 years) so I decided to get myself a new one. I have chosen AMD Ryzen 9 7900X (12 symmetric cores) and maxed out it’s memory capacity with whooping 128G of DDR5 3600 RAM. No dedicated GPU (graphics card), just the one inside Ryzen 9.

Trying to run full 70B (70*109) model of course failed. Out of memory. You need about 70*2=140GB of RAM to think about starting it, as it is using 16 bit floating point per each number. One may down-sample it to less accurate form called 70BQ8 using 8 bit per number. This form can be run on this machine.

No GPU?

You may ask me now why I didn’t get myself a GPU? You might have heard that GPU is a must of AI.

The answer is simple: money. Those 128GB costed me 1/2 of the cost entry level 4GB GPU. And if you look at the necessary memory sizes GPU with 8GB of RAM are useless (128GB of on board ram is about 1/4 of the 8GB GPU price). To be able to “talk” with a reasonably smart AI you need at least 13B model, for which 16GB will be okay. But if you like to train it or use smarter models, well… For training LLAMA2 70B you need, they say about 360GB of RAM.

The largest GPU I have found has 48GB of RAM and costs about three times this PC costed me.

AI on bare metal

So no AI for me? Is a GPU a must?

Well… it is not.

In fact 4-core CPU is enough to use it.

This machine I got can run 70BQ8/Q4 models and generate about 1 word per second and all cores are ice cold.

Note:7B model runs as fast as you can read, 13B model is okay too. Perchance is running on GPU, possibly 13B model (I am not sure, but I don’t think a private person would invest in GPU powerful enough to run 70B) and it can just spit out tens of words per second.

The CPU cores are assigned to the job, but they can do little. The 1 word per second is exactly the limit of DDR5 RAM throughput. I run some tests and my system shows about 40…60GB/s on board transfers. Considering the fact that AI is just a neural network and a neural network is a f*ing huge matrix, then for each step it must at least read it at least once. And this is what takes time. It really doesn’t matter if we have AXV2 / AVX512 instruction set, vectorization or whatever. The CPU will sit and wait patiently until on board RAM will fill the cache.

I ran some experiments and noticed that assigning more than 4 threads to the AI doesn’t rise it’s speed at all. Using all 12 cores or 4 cores gives practically the same result. Simply 4 cores can consume all on board RAM bandwidth without any problem.

Ryzen 9 has 12 cores, but can handle up to 24 threads. The idea of having more threads than cores works well normally because threads are making use of different functions of cores, so there is a chance that two of them can be run in relative “parallel” manner. In case of AI computation it is not. The size of llama.cpp executable is about 1.4MB. Yes, megabyte. I can bet that the actual computation core is just less than 1kB, so it will fit with Level 1 cache of Ryzen 9 core without any problem. So in case of AI all threads are doing exactly the same thing. I did observe that allowing the AI to use all 24 available “virtual cores” slows it down.

Python…

Most of AI is using python. Well… it doesn’t work well without GPU. The llama.cpp is written in C++ and runs about two to three times faster on bare metal than python version.

Of course, if you off-load everything to GPU python is good… because it is not python what runs computation.

Training on bare metal?

I failed to do it. Not because it is impossible, but because of rather sketchy docs. The only example I was able to run failed to learn anything after 18 hours of work. Possibly because it needs to do few thousands steps while each step takes about 15 minutes on bare metal.

Gladly there is a good alternative to training or fine tuning. Get yourself a model with large context. You can find 32k context models with ease and they will run well on bare metal.

Running on built-in GPU?

I tried it. And in fact it is hard to persuade llama.cpp to not use it. If it is compiled with GPU support it will use it for initial processing regardless if you will tell it to use GPU or not.

The good side of Ryzen 9 built-in GPU is the fact, that you may assign to it as much RAM as you like. The bad side is, that when tested with AI, the GPU performance equals to 1/2 of single CPU core. This is why ROCm (official AMD GPU computation library) doesn’t list built-in-GPU as supported device, even tough it can be persuaded to support it.

Use AI for your company?

No. You can use it as a first contact chat bot, but you need to be very, very careful. In fact you need to pass answers of your AI through the secondary AI. The first AI replies to customer requests, while secondary level AI checks if the answer is “legally safe”, that is doesn’t promise anything, isn’t rude or simply non-true.

And even then your clients will be really pissed off.

So maybe use it to learn company standards? A kind of “smart wiki”?

My recommendations is flat: No.

You will need a hell lot of GPU power to make AI to learn your standards. But standards aren’t cast in stone, so they will change. The dispersed nature of AI makes it hard to “un-learn” old versions. So if you will just fine tune it each time standard changes, you will end up with a lying mess of a fake intellect.

If you will compare it with Lucene+Tika or OmegaSearch+Xapian… well… My machine could index about ~10GB of text within less than few minutes and search for a phrase within sub-second. The 1GHz/1GB machine can do the same within about an hour for indexing and 1 second for searching. And it can be easily made to forget old standards. By the way, the total size of Xapian index (a searchable database) for this amount of text data is about 30MB. Yes, 30 mega bytes for about 10 giga bytes of input data. This should show you how oversized AI is. And just for your information, this entire database can fit in Ryzen 9 7900X L3 cache memory (Ryzen 9 7900X has 64MB of L3 cache). This is why it can be hell fast.

So considering cost of running, updating, ensuring response validity and etc… No. Don’t use AI.

Note:The only worthy use of AI in this case is an “assisted search” when AI changes user query, presented in human language, to a document database lookup question and provides some short form summary of found fragments. This may be valid use, since it doesn’t need any training. It shouldn’t be however exposed to your clients since it still can be made to tell really dangerous things (like, for an example, a promise to sell a new car for 1$).

Summary

Did I say it will be a short post? Well… I lied again.

I short words:

  • AI is not “intellect”;
  • it can be run on CPU only quite well;
  • it is “memory constrained” so memory size and speed is what is limiting it;
  • 128GB of RAM is not unreasonable size, it was in fact too small for many AI related tasks;
  • if you can, avoid it. Search engine is more practical.

Thanks for reading.

Leave a comment