Alhussein Jamil — AI Systems Engineer

// layer_00 · signal

About

AI systems, robotics, and hardware-shaped software.

Alhussein Jamil

AI systems engineer

Paris, France

These days at Arago, I work on the software layer that gets PyTorch talking to a photonic AI accelerator. In practice, that means backend paths, runtimes, memory movement, command queues, launch serialization, and the firmware-facing plumbing that makes unusual hardware feel runnable.

Before that, I helped exoskeletons learn better walking habits with reinforcement learning at Wandercraft, then helped robotic automation understand what it was looking at through computer vision at EyePick. I picked up the academic toolkit at École Polytechnique and Sorbonne Université.

I am happiest when code has to negotiate with the real world: hardware quirks, physics, humans, latency, bad assumptions, the whole circus. Yes, I vibe code too. I swear I have known how to code before AI was a thing.

Languages: French (Bilingual) · Arabic (Levantine) (Native) · English (Bilingual) · Spanish (Intermediate) · Interests: MMA, Puzzles, Violin, Climbing

// hidden_layers · experience

Where the signal got sharper

Accelerators, robotics, computer vision, and products that left the notebook.

Arago · Paris

PyTorch/ATen device backend for a photonic AI accelerator: custom kernels, graph execution, and framework integration.
Runtime and dispatch work for graph partitioning across devices, fast memory allocation, peer movement, launch parameter serialization, and concurrent command queues.
Firmware-facing execution where a control core decomposes compiled regions or single nodes into primitive device graphs that run across mono cores and synchronization points.
Low-level hardware/software integration: custom ISA emission, driver interfaces, device setup flows, simulation, and validation.

Problem: Matrix multiply on photonic hardware is one important workload, but Arago is building a full AI accelerator system: hardware, runtime, PyTorch integration, memory movement, firmware dispatch, kernel execution, and multi-device orchestration. From the developer's view it still has to feel like normal PyTorch.
Built: The software layer between PyTorch and custom silicon — compiler lowering, host runtime, driver paths, fast allocation, device-to-device movement, launch parameter serialization, command queues, and firmware-facing dispatch to mono cores.
Hard part: CUDA-like ergonomics while graph regions or single nodes are placed on devices, each launch is decomposed into a primitive device graph, mono cores coordinate through a per-device shared compute fabric, and events signal tensor readiness back to the framework.

PyTorchRuntimeFirmwareAccelerators

GitHub ↗

01

PyTorch computation graph

Real graph-shaped ATen dependency DAG
Tensor edges become placement constraints
Runnable regions are selected for acceleration

02

Compiler + host runtime

01 Lower graph

02 Place work

03 Launch + transfer plan

03

Per-device execution

subgraph A

Device 0

node -> compute DAG

Execution sessions

queue 0 queue 1 queue 2

Device primitive graph

Control core

Laser compute unit

Sync + device events

PCIe / device interconnect

subgraph B

Device 1

node -> compute DAG

Execution sessions

queue 0 queue 1 queue 2

Device primitive graph

Control core

Laser compute unit

Sync + device events

Conceptual diagram — generic public terms only.

EyePick · Paris

Computer-vision pipelines for real-time robotic automation in industrial, agricultural, and culinary settings.
Image-based anomaly detection and classification for quality control.
Adapted ResNet-based models as an alternative to YOLO-based detection pipelines under licensing constraints.

Wandercraft · Paris

Trained reinforcement-learning control policies for the Cassie bipedal robot and the Eve exoskeleton.
Used NVIDIA Isaac Gym and Ray RLlib for parallel simulation and distributed policy training.
Worked on sim-to-real transfer using imitation learning, domain adaptation, and visual adaptation methods.

Poppins (formerly Mila) · Paris

Implemented an adaptive-difficulty algorithm for a therapeutic game designed for dyslexic children.
Used clustering and gameplay metrics to adjust difficulty dynamically without explicit child feedback.
Contributed to algorithmic development and game-side integration.

// weights · toolkit

The toolkit

Languages, systems, education, teaching, and the bits I still use.

Programming

Python
C++
C
OCaml
Java
C#
SQL

AI Systems

PyTorch execution
Graph lowering
Runtime design
Custom ISA
Kernels
Multi-core execution

Machine Learning

Computer vision
Reinforcement learning
Imitation learning
Sim-to-real
Model deployment

Robotics

NVIDIA Isaac Gym
Ray RLlib
Robotic control
Industrial automation
VR robot interfaces

Tools

Linux
Git
Docker
Debugging
HW/SW integration

EDUCATION

École Polytechnique

Engineering Degree — MSc equivalent · 2020 — 2024
Sorbonne University — UPMC

Master 2 — Intelligent Systems · 2023 — 2024
Lycées Louis-le-Grand & Buffon

CPGE — MPSI → MP* · 2018 — 2020

TEACHING

X-HEC Master

Guest lecturer in machine learning for engineering and business master's students.
Lycée Saint-Louis

Oral examiner in mathematics for CPGE students.

LANGUAGES & OFF-DUTY

French bilingual · Arabic (Levantine) native · English bilingual · Spanish intermediate

MMA
Puzzles
Violin
Climbing

// forward_pass · selected_work

Projects

Personal builds with demos — RL, games, VR, geometry, and things you can actually open.

Problem: Teach a Cassie biped to walk in simulation without rewarding ugly, fragile tricks.
Built: Ray RLlib training pipeline, MuJoCo simulation, and policy export for Cassie-style locomotion.
Hard part: Distributed RL training where the number can improve while the gait still looks wrong.

/ droprl · cassie · RL debug signals synced to video t = 0.0s

/ droprl · snake-v0 · AI agent iter 0 · reward 0.0 · score 0

★ star · policy π

Problem: Run a psychophysics experiment where a virtual brush stroke still feels tied to a real robotic setup.
Built: Unity VR scene with Franka Emika arm control over Redis, hand-tracking calibration, and in-headset pleasantness/intensity ratings.
Hard part: Keeping VR rendering, Franka/haptic hardware, and subjective response capture in sync without breaking immersion.

Unity · VR · Franka Emika · Redis · Hand tracking

Touch simulation

Affective touch simulation

Paintbrush stroke rendered on a virtual hand — the core stimulus of the experiment.

Protocol & interface

Calibration

Hand-tracking calibration

Participant confirms poses and follows a countdown before trials begin.

In-VR ratings

Pleasantness & intensity UI

Sliders adjusted entirely inside VR — no headset removal between trials.

Problem: Build a playable prototype to study how children with ADHD move between exploration and exploitation.
Built: Unity hub world with four puzzle mini-games, pixel-art rooms, fog-of-war, grid logic, and physics puzzles.
Hard part: ~200 hours solo: game loop, level design, mechanics, lighting systems, and user-testing build.

Hub & level select

Temple hub with statue landmarks and portals to each mini-game.

Solo development · ~200 hours · designed for ADHD user testing

4 mini-games

Color-coded crate puzzle

Grid-based Sokoban: push crates onto matching colored altars.

Fragile bridge pathing

Numbered crates mark how many tiles you can cross before the floor gives way.

Ice sliding & hazards

Frictionless ice movement with spike traps between start and goal.

Flashlight labyrinth

Circular maze with fog-of-war — only tiles near the player stay visible.

Problem: Implement a complete board game with a competent AI opponent.
Built: Full game rules in Pygame with a Minimax search bot.
Hard part: Move generation and search depth across placing, moving, and flying phases.

/ noine · nine men's morris · minimax ·

click a point · form a mill to capture

original Pygame build

Noine board mid-game with orange and white pieces — Mid-game

Noine board during a mill capture — Mill phase

Problem: Compute optimal transport distances efficiently on grids and meshes.
Built: Python implementation of convolutional Wasserstein distances (Solomon et al., SIGGRAPH 2015).
Hard part: Structured-domain distance computation without materializing the full cost matrix.

Barycenter demos

2D shape morph

Dots morph into a star via a convolutional Wasserstein barycenter on a 2D grid.

RGB image morph

Per-channel barycenters blend two color photographs while preserving mass structure.

3D voxel morph

Voxelized dinosaur and double-torus shapes interpolate through a lightly smoothed 3D convolutional barycenter.

Surface distribution transport

Heat-kernel Gaussians on a torus blend in 13 discrete steps — endpoints, spread, and merge are all visible.

// latent_space · selected_projects

Side quests with teeth

Smaller repos — ideas I ran until they worked, without dressing them up as products.

Transformers

ViT — Hilbert curve patches

Vision Transformer with Hilbert-curve patch ordering for locality-preserving sequences.

NMT · PyTorch

Bahdanau attention NMT

From-scratch reimplementation of attention-based neural machine translation.

Audio · U-Net

U-Net audio source separation

Mask-based audio source separation operating on spectrograms.

RL · Env

Chess RL environment

Full chess rules and game-state API for reinforcement-learning self-play.

// render_pass · first_obsession

3D Creations

Early 3ds Max render experiments, age 10 to 15.

As a kid I was fascinated by Pixar — the worlds, the characters, the lighting. I wanted to make things like that, so around age 10 I started teaching myself 3ds Max as a hobby and kept at it until I was about 15: rooms, characters, little scenes like the ones below. I never turned it into a career, but it was my first long obsession with building something from nothing on a screen.