Writing an LLM from scratch, part 22 – training our LLM

141 points by gpjt 7 hours ago

mrasong 24 minutes ago

The cost comparison between local RTX 3090 and cloud A100 clusters is useful, but I wonder if the author accounted for hidden overhead—like data transfer time for large datasets or the time spent debugging CUDA compatibility issues on local hardware.

js8 25 minutes ago

It's based on a book https://www.manning.com/books/build-a-large-language-model-f..., is it a good book?

mettamage 6 hours ago

Here's part 1 [1]. Since his archive goes by date, it makes it a bit easier to guestimate which part is made in which month.

[1] https://www.gilesthomas.com/2024/12/llm-from-scratch-1

roschdal an hour ago

Nice, this is a recipe for making an evil AI which will destroy humanity.