The Machine Learning Thread

Are you able to give examples of the metrics you record?

Are they traditional hardware stuff like CPU, GPU, memory usage, IO etc. or LLM specific?
We record things like tokens in, tokens out, scoring of each response (using a different LLM and prompt to evaluate the first output), then human QA on low scoring and a sample of others where we again record the number of edits and what was edited.

In addition we record response times, have retry limits and overall time budgets for a chain of calls (so each stage has a limited time to retry before the whole chain is errored and must be tried again (for interactive elements of the application).
 
It is a great topic, I have identified myself that whether or not the employer invests that I need to spend a bit more time on API's these days. What tools are you using and the hardware?
We are mostly using OpenAI api and Databricks DBRX externally hosted fine tuned models. We host and train our own traditional NLP and clustering models in AWS sagemaker as well.

When interacting with these we use langchain for LLMs and MLflow for everything to help structure and record our calls.

We have a custom in house ML gateway API to abstract the different final endpoint differences from our apps but Ml flow recently released AI Gateway which does the same thing. This was something that came about from my own discussions with their product management team last year about building that capability (as long term we didn’t want to have to roll our own).
 
Back
Top Bottom