The 'storage tax' on cloud GPUs for short LLM runs is brutal. What's your workflow?
Summary
User seeks advice on cost-effective cloud GPU workflows for short LLM testing sessions, highlighting storage fees as a key pain point when preserving environments between runs.
Similar Articles
Is a high-end private local LLM setup worth it?
A user debates whether investing in a high-end private local LLM setup with 5×3090 GPUs can match cloud services like Claude or GPT while ensuring data privacy.
Local LLM CPU users... How long is it taking you to do anything?
A discussion about the performance of running large language models locally on CPU, especially with large context sizes, and the challenges of VRAM constraints.
Is it worth getting a 5090 for my needs?
User asks whether purchasing an RTX 5090 and high-end PC for ~$5500 is worth it for LLM experimentation and learning, compared to cloud compute alternatives.
@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm
AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.
@KL_Div: LLMs require more GPU memory as they generate longer responses. Can we make GPU memory constant without significantly s…
IceCache introduces Dynamic Continuous Indexing to keep GPU memory usage constant during long LLM generations with minimal accuracy loss.