LLMs Rules of Thumb

KV Cache

TODO: What are good rules of thumb for KV-Cache size?

Cost of Serving

See slide 22:41 in GTC video

H100-80G: $2.3 per hour

For CoreWeave if you get a reserved multi-year deal

For llama2-70b

Batch Size	Cost
1	$9.73
4	$2.48
8	$1.26
32	$0.35

RAG

6KB/Embedding (OpenAI are 1536 float32)
3 million docs before you need a vector database tweet
- 18G of storage

Tokens

1 token ~4 characters For English
- Seems like its less for code; https://github.com/jlewi/roboweb/issues/414
  - Maybe 1 token ~2 characters