Nick's Path to AI enlightenment

An LLM is stochastic so it will generate different code each time, hence it makes it extremely difficult to maintain if you want to adjust the code over time.
 
An LLM is stochastic so it will generate different code each time, hence it makes it extremely difficult to maintain if you want to adjust the code over time.

No, an LLM is deterministic (at least in theory) but this gets a bit complicated, and the main issues w.r.t coding are somewhat separate to that.

With the temperature set to 0 an LLM should be deterministic (it theoretically is) but there can be sources of a small amount of randomness in practice, less of an issue w.r.t a smaller LLM on a single device but when you're dealing with multiple floating point operations in parallel over multiple GPUs and indeed multiple LLMs in a MoE setup then you end up introducing a small amount of non-deterministic behaviour.

But the issues with coding are more to do with things like; losing context when dealing with large code bases, no knowledge of domain-specific libraries, can't (necessarily) validate outputs unless the LLM has a runtime environment (some do now)... also it can output needlessly verbose code when simpler solutions exist.

Those issues are much more of an issue if someone is expecting to simply prompt an LLM and then get a on-shot solution - that's not to say you can't get such a solution (you can or could one-shot a flappy bird game but then again the LLMs have seen multiple flappy bird clones), the better way to use them is as a "co-pilot" and to generate snippets of code while you the coder are aware of the context, the other libraries etc.
 
An LLM in a generative capacity uses random noise.

An LLM without learning that is static is, as you’ve stated, results in a distribution. Ie as it doesn’t learn the paths don’t change.

Sorry I was thinking of agentic using things like chatgpt, a genai.

Surely the selling point is gen ai is dynamic.. and the reason for the painful fun.
Switching off (temp zero) means you may as well make code with an AI, then use that.
 
Last edited:
To be clear it's not so much that it uses random noise (obvs aside from changing the temperature) but that in some scenarios random noise is going to creep in.

LLMs are not dynamic, you can fine-tune (before deploying) them to change them but that's more tweaking the top layers.
 
An LLM in a generative capacity uses random noise.

An LLM without learning that is static is, as you’ve stated, results in a distribution. Ie as it doesn’t learn the paths don’t change.

Sorry I was thinking of agentic using things like chatgpt, a genai.

Surely the selling point is gen ai is dynamic.. and the reason for the painful fun.
Switching off (temp zero) means you may as well make code with an AI, then use that.


I think you are mixing up a few concepts.

LLMs have a training phase when they learn, and you can fine tune them, but they don't learn anything during inference so the learning is static.

The selling point of genAI is certainly not to be stochastic for the most oart, albeit there are some use cases especially for image generation etc. that it is useful . The most widely used way to use an LLM is likely with temperature at zero because you can have consistent results. The output will be dynamic based on the input prompt but the same prompt should always give identical output for most use cases. As Dowie says, due to technical reasons related to floating point arithmetic at scale this isn't guaranteed and this is one of the biggest limitations of LLMs. Personally this has blocked several projects from going to production.


Remember that an LLM simply produces a probably distribution of the next token. Sampling from that distribution is controlled by temperature, top-p, top-k and other factors depending on the model.

If a problem can be well solved using alternative simpler ML models when the temperature is 0 then there they can almost always be solved with high temperatures. And indeed the most useful models tend to operate on probability distributions anyway given that statistics and distributions of data is essentially the heart of ML.



The likes of Copilot will most likely have temperature at 0. Temperature at zero just means the most likely newt token is selected . If you want your LLM to say answer questions about the capital city of countries then you don't want it to respond with anything other than the most likely token.
 
I’ll recheck my notes when I have chance.

I know about temp, p & k hyper parameters and for non-llm the explore/exploit ratio.

Noise being used in training input to desensitise (has a similar effect to downsizing).
Noise also being used in gen ai where the text input is also supplemented with random input to generate output.

I’m also aware of the llm process (use of vector maps etc).

The “agentic” process of feedback isn’t static compared to old agents. I can see the llm can be setup either way with 0 temp. True but most hype is people are pushing for both operating reinforcement but also starting to look at non zero temp (seems a bad idea to me)

I’ll have a more indepth look again next week as I want to write my own agents.
 
Can ChatGPT code?
Try Claude 3.5 Preview its miles ahead of chatgpt4

I'm using it as an auto pilot in visual studio code, seems to always give working code and never had the same error twice.
if i say it's broken it fixes it.
I just have to hit save then run the code it's amazing so far.

Chat GPT made me want to punch the screen, I knew about 70% of the times its code wouldn't work, then it would only fix that code maybe 50% of the time at best...

Claude 3.5 had me a model trained in about 10 minutes... with the simplest of instructions... I double checked it was taking into accounts capitals, non capitals, all the special symbols etc, didn;t even need to remind the bot that you can't name images like /.png..
I never had to explain we need to train the model with different font sizes... it just did it all properly of its own accord

when I copy/pasted the confidence scores from the terminal.. omg it even remembers its for reading text from game screenshots !

Yes, the OCR system is working! It's detecting characters with good confidence scores (many above 0.90). However, I notice a few areas we could improve:

1. Let's enhance the word grouping to better handle text in game screenshots:

### [ocr_front_pipeline.py](file:///c%3A/Users/lh002/PycharmProjects/PythonProject/ocr_front_pipeline.py)

[SNIPPED CODE]

These improvements will:
1. Better handle vertical alignment of text
2. Filter out single characters (likely false positives)
3. Add bounding box information
4. Improve confidence thresholds
5. Better group characters into actual words

Would you like to test these improvements with your screenshots?


I had to remind chatgpt even that I was in windows envrioment almost every 5 minutes... it kept trying to use unix code with slashes not compatible with windows :/

it was trying to find files in like c:/dataset\synthetic_images\

which obviously isn't a valid dir... with mixed slashes / \


Claude 3.5 is flawless so far, Technically its github copilot using the Claude 3.5 sonnet preview
 
Last edited:
maybe this is interesting

It seems they had a chatgpt doing the code, then another chatgpt evaluating the original chat gpt.


which I guess is the only way that makes sense when chatgpt gives so much garbage code
 
Last edited:
I tried Claude late last year, imo it was ok but didn't improve anything for me, went back to ChatGPT. I use the API (Assistant, Vision and Chat Completion) & the browser version. I think the O model is fantastic. OpenAI has some imo amazing features like the advanced multimodal input/output with almost zero latency in voice interactions, quite expensive though. But even some of the 'old' cheaper voice's like Nova are miles better than most of the competition.
 
maybe this is interesting

It seems they had a chatgpt doing the code, then another chatgpt evaluating the original chat gpt.


which I guess is the only way that makes sense when chatgpt gives so much garbage code

So basically an adversarial network by the sounds of it. Not sure I'd want both the same vendor of model.
 
Last edited:
So the ethics for AI is done and submitted. I can see the issues etc but for some reason I found it difficult to get it down on paper.
 
Claude 3.5 is flawless so far, Technically its github copilot using the Claude 3.5 sonnet preview

Interesting that Claude 3.5 data privacy retains your data/code. Whilst Claude Pro does not unless explicitly told todo so. I did see a Samsung case where proprietary code was put into ChatGPT for debugging only to essentially make the code public domain.

Like all vendors, it will be interesting to see the changes that Trump may make w.r.t. AI and data access. I already see that high end AIs require mobile numbers for verification and we're starting to see blocks on VPN access to AI prompts.
 
So I’ve decided against my business case for air-sea salvage using drones.

I have another idea that is better suited:
* i have had direct experience
* impacts a wide market
* has a gov research paper request out right at this moment indicating my idea is a good one and relevant

I know what & so I have a plan for tomorrow to build the business case, leaving time to think on a third draft, and answering a second simple question on tuesday.
 
So instead, I built my business case on electricity consumption optimisation.

In short I've just submitted the module 6 business case assignment and the course is complete. I have to wait until March to get the finalised score and certificate but I'm currently on a pass with a 71.3% grade with a ~22% assessment to be graded. I'd love a pass with 90% but I have a feeling it may be in the 80-90% range we'll see.

Oh and I got 93.8% on the ethics mark :D
 
Last edited:
Ok, courses.

So one of the reasons I went for the more expensive option (~£2400) of the Oxford course was:
* A recognised course operator
* framework that is vendor independent that teaches you the principles rather than how to use their technology
* covers a wide range of subjects - the history, the technical (AI, ML, deep-learning), spotting ways to develop or deploy, the ethics and preparing business cases - each with hours of learning and assignments.
* forces that home with written assignments that you have to learn the materials, think about scenarios and then apply that yourself as part of the written assessment (rather than multiple choice or complete the sentence based on their example).
* has a set of classmates - this gives you additional interesting perspectives, and the course actually had graded full 500+ class and small group 5 people assignments that prompted discussions and thinking in addition to your written assignment. We have a WhatsApp group and have meet-ups etc driving some future discussions.
* a recognised name badge (Saïd Business School Oxford University is world known and respected) that you can put on the CV that says you've done some professional development but not simply copied someone else's responses. There's only the more advanced online vendor courses that you have todo in exam conditions etc that really instil the same reaction.

The outcome of the course is that I have a pretty good starting point for more detail but can easily BS detect in terms of the noise around AI. I can now look at what next using that framework and all is good.


So I'm building a list here of courses from known free/good vendors, naturally the vendors push their own products and services.

In starting this from scratch (I have detailed neural network knowledge but we'll ignore that) I do think the history side of things helped put the technology in perspective (expert systems and simple logic such as your Roomba vaccum etc). The focus on non-generative AI initially help understand how data science fed into AI, ML (supervised learning, unsupervised and reinforcement learning), then into Deep Learning (neural networks) and then into Generative AI itself.

I will update these later when I have more time:

Oxford
Cambridge
Harvard
Stamford
LinkedIn Learning
Google
Microsoft
AWS
nVidia
 
Back
Top Bottom