AI with BASIC
Autodiff, Tensors, and Transformers in jdBasic
jdBasic isn’t only retro fun — it has a modern Tensor + Autodiff engine. That means you can build tiny neural networks, train them, and even experiment with a small Transformer-style LLM — right in BASIC.
What you’ll build
Neural basics
Learn how neurons, layers, and activations work — using jdBasic arrays and matrix math.
Autodiff training
Train a small network to solve XOR using TENSOR.BACKWARD and TENSOR.UPDATE.
Transformer demo
See how attention blocks stack into a Transformer that learns next-character prediction.
1) The AI building blocks in jdBasic
Plain arrays: neuron + layer
jdBasic array math is great for understanding the “math inside” a network. You can multiply arrays element-wise and reduce with SUM().
' --- A single neuron (array math) --- INPUTS = [0.5, -1.2, 0.8] WEIGHTS = [0.8, 0.1, -0.4] BIAS = 0.5 WEIGHTED_SUM = SUM(INPUTS * WEIGHTS) OUTPUT = WEIGHTED_SUM + BIAS PRINT "Output:"; OUTPUT
Next step: do the same for a full layer using matrix multiplication (MATMUL).
Tensors + autodiff: training-ready
The Tensor engine lets you build a computation graph automatically. When you call TENSOR.BACKWARD, jdBasic calculates gradients for you.
' --- Convert arrays into tensors --- X = TENSOR.FROM([[1, 2], [3, 4]]) W = TENSOR.FROM([[5, 6], [7, 8]]) Y = TENSOR.MATMUL(X, W) ' --- Backprop through the graph --- TENSOR.BACKWARD Y PRINT "Grad of X:" PRINT FRMV$(TENSOR.TOARRAY(X.grad))
That’s the core idea: forward pass builds the graph, backward pass computes gradients.
2) Train XOR (your first real training loop)
XOR is the “hello world” of training: a tiny dataset, but it forces the network to learn a non-linear pattern. In jdBasic, autodiff makes the loop surprisingly clean.
XOR training (autodiff)
' ========================================================== ' == Autodiff Neural Network: Learn XOR in jdBasic ' ========================================================== ' 1) Training data TRAINING_INPUT_DATA = [[0, 0], [0, 1], [1, 0], [1, 1]] TRAINING_OUTPUT_DATA = [[0], [1], [1], [0]] INPUTS = TENSOR.FROM(TRAINING_INPUT_DATA) TARGETS = TENSOR.FROM(TRAINING_OUTPUT_DATA) ' 2) Model definition (2 → 3 → 1) MODEL = {} HIDDEN_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 2, "units": 3}) OUTPUT_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 3, "units": 1}) MODEL{"layers"} = [HIDDEN_LAYER, OUTPUT_LAYER] ' 3) Optimizer + training params OPTIMIZER = TENSOR.CREATE_OPTIMIZER("SGD", {"learning_rate": 0.1}) EPOCHS = 15000 ' 4) Forward pass (sigmoid activations) FUNC MODEL_FORWARD(current_model, input_tensor) temp = input_tensor layers = current_model{"layers"} FOR i = 0 TO LEN(layers) - 1 layer = layers[i] temp = MATMUL(temp, layer{"weights"}) + layer{"bias"} temp = TENSOR.SIGMOID(temp) NEXT i RETURN temp ENDFUNC ' 5) Loss function (MSE) FUNC MSE_LOSS(predicted, actual) err = actual - predicted RETURN SUM(err ^ 2) / LEN(TENSOR.TOARRAY(err))[0] ENDFUNC ' 6) Training loop FOR epoch = 1 TO EPOCHS PRED = MODEL_FORWARD(MODEL, INPUTS) LOSS = MSE_LOSS(PRED, TARGETS) TENSOR.BACKWARD LOSS MODEL = TENSOR.UPDATE(MODEL, OPTIMIZER) IF epoch MOD 1000 = 0 THEN PRINT "Epoch:"; epoch; " Loss:"; TENSOR.TOARRAY(LOSS) ENDIF NEXT epoch
What to try next
- Change hidden units: units: 3 → units: 6
- Swap the loss: MSE → Cross-entropy (if your output is classification)
- Print the final predictions for the 4 XOR inputs
Why XOR matters
XOR can’t be solved by a single linear layer. So if your training works here, you’ve proven your network can learn a non-linear function.
3) Debugging: gradients in 10 lines
When training “does nothing”, it’s often a gradient issue. Here’s a minimal gradient check: multiply two tensors, run backward, print the grads.
MATMUL + backward pass
PRINT "--- Testing MATMUL + BACKWARD ---" A = TENSOR.FROM([[1, 2], [3, 4]]) B = TENSOR.FROM([[5, 6], [7, 8]]) C = TENSOR.MATMUL(A, B) PRINT "C:" PRINT FRMV$(TENSOR.TOARRAY(C)) TENSOR.BACKWARD C PRINT "Grad(A):" PRINT FRMV$(TENSOR.TOARRAY(A.grad)) PRINT "Grad(B):" PRINT FRMV$(TENSOR.TOARRAY(B.grad))
Tip: if .grad is empty or all zeros unexpectedly, the graph might be disconnected (or you’re converting back to arrays too early).
4) A tiny Transformer LLM
This demo shows a character-level Transformer that learns next-character prediction. It’s not “ChatGPT in BASIC” — but it *is* the real Transformer logic: embeddings, positional encoding, stacked attention blocks, and a cross-entropy loss.
What the demo does
- Build a character vocabulary from training text
- Embed tokens into vectors
- Run stacked self-attention blocks (pre-LN)
- Train with cross-entropy loss
- Sample one char at a time to generate text
Why it works in BASIC
The Tensor engine does the heavy lifting: matrix ops, softmax, layer norm, and backprop. You focus on the structure of the model.
Core pieces (short excerpt)
' --- Create layers --- MODEL = {} MODEL{"embedding"} = TENSOR.CREATE_LAYER("EMBEDDING", {"vocab_size": VOCAB_SIZE, "embedding_dim": EMBEDDING_DIM}) MODEL{"output_norm"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM}) MODEL{"output"} = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM, "units": VOCAB_SIZE}) ' --- Stack Transformer blocks --- MODEL{"layers"} = [] FOR i = 0 TO NUM_LAYERS - 1 layer = {} layer{"attention"} = TENSOR.CREATE_LAYER("ATTENTION", {"embedding_dim": EMBEDDING_DIM}) layer{"norm1"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM}) layer{"ffn1"} = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM, "units": HIDDEN_DIM * 2}) layer{"ffn2"} = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM * 2, "units": HIDDEN_DIM}) layer{"norm2"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM}) MODEL{"layers"} = APPEND(MODEL{"layers"}, layer) NEXT i
Full demo files live in your repo/scripts (recommended to keep training text short in the browser).
Practical tips (especially for Web/WASM)
Keep it small
In the browser, you’ll get the best experience from small models and short datasets. For example: character-level text, a few thousand steps, tiny embedding sizes.
- Start with HIDDEN_DIM = 32 or 64
- Try NUM_LAYERS = 1 or 2 first
- Print loss every 50–200 steps (not every step)
Save and reuse
Training is slow. Save models once you like them, then load for inference.
TENSOR.SAVEMODEL MODEL, "my_model.json" LOADED = TENSOR.LOADMODEL("my_model.json")
A good learning path
Step 1
Do the array-based neuron + layer examples, then add ReLU/Sigmoid.
Step 2
Then train XOR (nl05). This teaches you forward → loss → backward → update.
Step 3
Finally, explore attention blocks in the Transformer demo. Start small, then scale slowly.