LLMs Rules of Thumb
KV Cache
- TODO: What are good rules of thumb for KV-Cache size?
Cost of Serving
H100-80G: $2.3 per hour
- For CoreWeave if you get a reserved multi-year deal
For llama2-70b
Batch Size | Cost |
---|---|
1 | $9.73 |
4 | $2.48 |
8 | $1.26 |
32 | $0.35 |
RAG
- 6KB/Embedding (OpenAI are 1536 float32)
- 3 million docs before you need a vector database tweet
- 18G of storage
Tokens
- 1 token ~4 characters For English
- Seems like its less for code; https://github.com/jlewi/roboweb/issues/414
- Maybe 1 token ~2 characters
- Seems like its less for code; https://github.com/jlewi/roboweb/issues/414