jdBasic Try Live

AI with BASIC
Autodiff, Tensors, and Transformers in jdBasic

jdBasic isn’t only retro fun — it has a modern Tensor + Autodiff engine. That means you can build tiny neural networks, train them, and even experiment with a small Transformer-style LLM — right in BASIC.

What you’ll build

Neural basics

Learn how neurons, layers, and activations work — using jdBasic arrays and matrix math.

Autodiff training

Train a small network to solve XOR using TENSOR.BACKWARD and TENSOR.UPDATE.

Transformer demo

See how attention blocks stack into a Transformer that learns next-character prediction.

1) The AI building blocks in jdBasic

Plain arrays: neuron + layer

jdBasic array math is great for understanding the “math inside” a network. You can multiply arrays element-wise and reduce with SUM().

' --- A single neuron (array math) ---
INPUTS  = [0.5, -1.2, 0.8]
WEIGHTS = [0.8,  0.1, -0.4]
BIAS    = 0.5

WEIGHTED_SUM = SUM(INPUTS * WEIGHTS)
OUTPUT = WEIGHTED_SUM + BIAS

PRINT "Output:"; OUTPUT

Next step: do the same for a full layer using matrix multiplication (MATMUL).

Tensors + autodiff: training-ready

The Tensor engine lets you build a computation graph automatically. When you call TENSOR.BACKWARD, jdBasic calculates gradients for you.

' --- Convert arrays into tensors ---
X = TENSOR.FROM([[1, 2], [3, 4]])
W = TENSOR.FROM([[5, 6], [7, 8]])

Y = TENSOR.MATMUL(X, W)

' --- Backprop through the graph ---
TENSOR.BACKWARD Y

PRINT "Grad of X:"
PRINT FRMV$(TENSOR.TOARRAY(X.grad))

That’s the core idea: forward pass builds the graph, backward pass computes gradients.

2) Train XOR (your first real training loop)

XOR is the “hello world” of training: a tiny dataset, but it forces the network to learn a non-linear pattern. In jdBasic, autodiff makes the loop surprisingly clean.

XOR training (autodiff)

' ==========================================================
' == Autodiff Neural Network: Learn XOR in jdBasic
' ==========================================================

' 1) Training data
TRAINING_INPUT_DATA  = [[0, 0], [0, 1], [1, 0], [1, 1]]
TRAINING_OUTPUT_DATA = [[0], [1], [1], [0]]

INPUTS  = TENSOR.FROM(TRAINING_INPUT_DATA)
TARGETS = TENSOR.FROM(TRAINING_OUTPUT_DATA)

' 2) Model definition (2 → 3 → 1)
MODEL = {}
HIDDEN_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 2, "units": 3})
OUTPUT_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 3, "units": 1})
MODEL{"layers"} = [HIDDEN_LAYER, OUTPUT_LAYER]

' 3) Optimizer + training params
OPTIMIZER = TENSOR.CREATE_OPTIMIZER("SGD", {"learning_rate": 0.1})
EPOCHS = 15000

' 4) Forward pass (sigmoid activations)
FUNC MODEL_FORWARD(current_model, input_tensor)
    temp = input_tensor
    layers = current_model{"layers"}
    FOR i = 0 TO LEN(layers) - 1
        layer = layers[i]
        temp = MATMUL(temp, layer{"weights"}) + layer{"bias"}
        temp = TENSOR.SIGMOID(temp)
    NEXT i
    RETURN temp
ENDFUNC

' 5) Loss function (MSE)
FUNC MSE_LOSS(predicted, actual)
    err = actual - predicted
    RETURN SUM(err ^ 2) / LEN(TENSOR.TOARRAY(err))[0]
ENDFUNC

' 6) Training loop
FOR epoch = 1 TO EPOCHS
    PRED = MODEL_FORWARD(MODEL, INPUTS)
    LOSS = MSE_LOSS(PRED, TARGETS)

    TENSOR.BACKWARD LOSS
    MODEL = TENSOR.UPDATE(MODEL, OPTIMIZER)

    IF epoch MOD 1000 = 0 THEN
        PRINT "Epoch:"; epoch; " Loss:"; TENSOR.TOARRAY(LOSS)
    ENDIF
NEXT epoch

What to try next

  • Change hidden units: units: 3units: 6
  • Swap the loss: MSE → Cross-entropy (if your output is classification)
  • Print the final predictions for the 4 XOR inputs

Why XOR matters

XOR can’t be solved by a single linear layer. So if your training works here, you’ve proven your network can learn a non-linear function.

3) Debugging: gradients in 10 lines

When training “does nothing”, it’s often a gradient issue. Here’s a minimal gradient check: multiply two tensors, run backward, print the grads.

MATMUL + backward pass

PRINT "--- Testing MATMUL + BACKWARD ---"

A = TENSOR.FROM([[1, 2], [3, 4]])
B = TENSOR.FROM([[5, 6], [7, 8]])

C = TENSOR.MATMUL(A, B)
PRINT "C:"
PRINT FRMV$(TENSOR.TOARRAY(C))

TENSOR.BACKWARD C

PRINT "Grad(A):"
PRINT FRMV$(TENSOR.TOARRAY(A.grad))

PRINT "Grad(B):"
PRINT FRMV$(TENSOR.TOARRAY(B.grad))

Tip: if .grad is empty or all zeros unexpectedly, the graph might be disconnected (or you’re converting back to arrays too early).

4) A tiny Transformer LLM

This demo shows a character-level Transformer that learns next-character prediction. It’s not “ChatGPT in BASIC” — but it *is* the real Transformer logic: embeddings, positional encoding, stacked attention blocks, and a cross-entropy loss.

What the demo does

  • Build a character vocabulary from training text
  • Embed tokens into vectors
  • Run stacked self-attention blocks (pre-LN)
  • Train with cross-entropy loss
  • Sample one char at a time to generate text

Why it works in BASIC

The Tensor engine does the heavy lifting: matrix ops, softmax, layer norm, and backprop. You focus on the structure of the model.

Core pieces (short excerpt)

' --- Create layers ---
MODEL = {}
MODEL{"embedding"} = TENSOR.CREATE_LAYER("EMBEDDING", {"vocab_size": VOCAB_SIZE, "embedding_dim": EMBEDDING_DIM})
MODEL{"output_norm"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM})
MODEL{"output"} = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM, "units": VOCAB_SIZE})

' --- Stack Transformer blocks ---
MODEL{"layers"} = []
FOR i = 0 TO NUM_LAYERS - 1
    layer = {}
    layer{"attention"} = TENSOR.CREATE_LAYER("ATTENTION", {"embedding_dim": EMBEDDING_DIM})
    layer{"norm1"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM})
    layer{"ffn1"}  = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM, "units": HIDDEN_DIM * 2})
    layer{"ffn2"}  = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM * 2, "units": HIDDEN_DIM})
    layer{"norm2"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM})
    MODEL{"layers"} = APPEND(MODEL{"layers"}, layer)
NEXT i

Full demo files live in your repo/scripts (recommended to keep training text short in the browser).

Practical tips (especially for Web/WASM)

Keep it small

In the browser, you’ll get the best experience from small models and short datasets. For example: character-level text, a few thousand steps, tiny embedding sizes.

  • Start with HIDDEN_DIM = 32 or 64
  • Try NUM_LAYERS = 1 or 2 first
  • Print loss every 50–200 steps (not every step)

Save and reuse

Training is slow. Save models once you like them, then load for inference.

TENSOR.SAVEMODEL MODEL, "my_model.json"
LOADED = TENSOR.LOADMODEL("my_model.json")

A good learning path

Step 1

Do the array-based neuron + layer examples, then add ReLU/Sigmoid.

Step 2

Then train XOR (nl05). This teaches you forward → loss → backward → update.

Step 3

Finally, explore attention blocks in the Transformer demo. Start small, then scale slowly.

Shortcut: open a script, then iterate fast — Ctrl + S in the editor.