@TensorTonic: You reach for ReLU, GELU, and Softmax in almost every model you build. But could you write the forward pass and the gra…

X AI KOLs Timeline 06/27/26, 10:53 AM Tools

Summary

A tweet promoting TensorTonic, a platform that allows users to practice implementing nine common activation functions (Sigmoid, ReLU, Tanh, Softmax, Leaky ReLU, GELU, Swish, ELU, SELU) from scratch, including forward pass and gradient computation.

You reach for ReLU, GELU, and Softmax in almost every model you build. But could you write the forward pass and the gradients from memory without opening a reference? > Sigmoid > ReLU > Tanh > Softmax > Leaky ReLU > GELU > Swish > ELU > SELU Nine activation functions, each explained by implementing them from scratch. Practice all of them on TensorTonic.

Original Article

View Cached Full Text

Cached at: 06/27/26, 01:56 PM

You reach for ReLU, GELU, and Softmax in almost every model you build. But could you write the forward pass and the gradients from memory without opening a reference?

Sigmoid ReLU Tanh Softmax Leaky ReLU GELU Swish ELU SELU

Nine activation functions, each explained by implementing them from scratch.

Practice all of them on TensorTonic.

Similar Articles

@tetsuoai: The entire core of a neural network on four cards. Neuron, forward pass, activations, backprop. Learn these four and yo…

X AI KOLs Timeline

A set of four cards covering the core concepts of neural networks: neuron, forward pass, activations, and backpropagation, aimed at helping learners understand how models from perceptrons to transformers work.

@NFTCPS: You keep talking about AI, but can't even explain what a Transformer is? There's a repo that goes all out — builds a GPT from scratch without using any high-level libraries. It lays out exactly how Attention, Multi-Head, Feed-Forward, Embedding, Residual connections, and Layer Norm are pieced together. And it's not just the model; the entire pipeline is covered…

X AI KOLs Timeline

A GitHub open-source project that implements the complete GPT training pipeline from scratch, including data preprocessing, pretraining, SFT, and RLHF post-training, all based on native PyTorch. Ideal for developers who want to deeply understand the Transformer architecture.

Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes

Hugging Face Daily Papers

This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.

@gordic_aleksa: new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i…

X AI KOLs Timeline

An in-depth blog post exploring the inner workings of modern dense transformers, covering topics such as YaRN for positional information, hybrid attention for long context lengths, soft capping, QK normalization, and transformer math including FLOPs/token formulas and cluster sizing.

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

X AI KOLs Timeline

The author shares learnings from training a 160M parameter LLM from scratch, experimenting with architectures like multi-token prediction and hierarchical reasoning models. They emphasize the importance of fast iteration, simplifying ideas, and understanding why architectures work.

Similar Articles

@tetsuoai: The entire core of a neural network on four cards. Neuron, forward pass, activations, backprop. Learn these four and yo…

Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes

@gordic_aleksa: new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i…

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

Submit Feedback