Skip to content

// pytorch → photons

Software engineer · AI systems · Paris

I build the software layer where PyTorch meets silicon and light.

Backends, kernels, runtimes, driver paths, firmware hooks, and demos: the machinery that turns exotic AI hardware into something developers can actually run.

Currently at Arago, building software for a photonic AI accelerator, from PyTorch/ATen entry points to runtime dispatch and device execution.

scroll

// layer_00 · signal

About

AI systems, robotics, and hardware-shaped software.

Portrait of Alhussein Jamil

Alhussein Jamil

AI systems engineer

Paris, France

These days at Arago, I work on the software layer that gets PyTorch talking to a photonic AI accelerator. In practice, that means backend paths, runtimes, memory movement, command queues, launch serialization, and the firmware-facing plumbing that makes unusual hardware feel runnable.

Before that, I helped exoskeletons learn better walking habits with reinforcement learning at Wandercraft, then helped robotic automation understand what it was looking at through computer vision at EyePick. I picked up the academic toolkit at École Polytechnique and Sorbonne Université.

I am happiest when code has to negotiate with the real world: hardware quirks, physics, humans, latency, bad assumptions, the whole circus. Yes, I vibe code too. I swear I have known how to code before AI was a thing.

Languages: French (Bilingual) · Arabic (Levantine) (Native) · English (Bilingual) · Spanish (Intermediate) · Interests: MMA, Puzzles, Violin, Climbing

// hidden_layers · experience

Where the signal got sharper

Accelerators, robotics, computer vision, and products that left the notebook.

Software Engineer — AI Accelerator Stack

Arago · Paris

  • PyTorch/ATen device backend for a photonic AI accelerator: custom kernels, graph execution, and framework integration.
  • Runtime and dispatch work for graph partitioning across devices, fast memory allocation, peer movement, launch parameter serialization, and concurrent command queues.
  • Firmware-facing execution where a control core decomposes compiled regions or single nodes into primitive device graphs that run across mono cores and synchronization points.
  • Low-level hardware/software integration: custom ISA emission, driver interfaces, device setup flows, simulation, and validation.
See more
Problem
Matrix multiply on photonic hardware is one important workload, but Arago is building a full AI accelerator system: hardware, runtime, PyTorch integration, memory movement, firmware dispatch, kernel execution, and multi-device orchestration. From the developer's view it still has to feel like normal PyTorch.
Built
The software layer between PyTorch and custom silicon — compiler lowering, host runtime, driver paths, fast allocation, device-to-device movement, launch parameter serialization, command queues, and firmware-facing dispatch to mono cores.
Hard part
CUDA-like ergonomics while graph regions or single nodes are placed on devices, each launch is decomposed into a primitive device graph, mono cores coordinate through a per-device shared compute fabric, and events signal tensor readiness back to the framework.
PyTorchRuntimeFirmwareAccelerators
GitHub ↗

Runtime map

PyTorch graph → compiler split → multi-device execution

01

PyTorch computation graph

input aten.mm norm gelu copy aten.mm add output
  • Real graph-shaped ATen dependency DAG
  • Tensor edges become placement constraints
  • Runnable regions are selected for acceleration
02

Compiler + host runtime

01 Lower graph
02 Place work
03 Launch + transfer plan
03

Per-device execution

subgraph A

Device 0

node -> compute DAG

Execution sessions

queue 0 queue 1 queue 2

Device primitive graph

Control core
Laser compute unit
Sync + device events
PCIe / device interconnect
subgraph B

Device 1

node -> compute DAG

Execution sessions

queue 0 queue 1 queue 2

Device primitive graph

Control core
Laser compute unit
Sync + device events
04

Result

  • A launch may be a compiled subgraph or a single PyTorch node
  • Device firmware decomposes that launch into a primitive DAG for mono cores
  • Intermediate tensors move through PCIe or a device interconnect

Conceptual diagram — generic public terms only.

Machine Learning Engineer

EyePick · Paris

  • Computer-vision pipelines for real-time robotic automation in industrial, agricultural, and culinary settings.
  • Image-based anomaly detection and classification for quality control.
  • Adapted ResNet-based models as an alternative to YOLO-based detection pipelines under licensing constraints.

Reinforcement Learning Intern

Wandercraft · Paris

  • Trained reinforcement-learning control policies for the Cassie bipedal robot and the Eve exoskeleton.
  • Used NVIDIA Isaac Gym and Ray RLlib for parallel simulation and distributed policy training.
  • Worked on sim-to-real transfer using imitation learning, domain adaptation, and visual adaptation methods.

AI Algorithms Intern

Poppins (formerly Mila) · Paris

  • Implemented an adaptive-difficulty algorithm for a therapeutic game designed for dyslexic children.
  • Used clustering and gameplay metrics to adjust difficulty dynamically without explicit child feedback.
  • Contributed to algorithmic development and game-side integration.

// weights · toolkit

The toolkit

Languages, systems, education, teaching, and the bits I still use.

Programming

  • Python
  • C++
  • C
  • OCaml
  • Java
  • C#
  • SQL

AI Systems

  • PyTorch execution
  • Graph lowering
  • Runtime design
  • Custom ISA
  • Kernels
  • Multi-core execution

Machine Learning

  • Computer vision
  • Reinforcement learning
  • Imitation learning
  • Sim-to-real
  • Model deployment

Robotics

  • NVIDIA Isaac Gym
  • Ray RLlib
  • Robotic control
  • Industrial automation
  • VR robot interfaces

Tools

  • Linux
  • Git
  • Docker
  • Debugging
  • HW/SW integration

EDUCATION

  • École Polytechnique

    Engineering Degree — MSc equivalent · 2020 — 2024

  • Sorbonne University — UPMC

    Master 2 — Intelligent Systems · 2023 — 2024

  • Lycées Louis-le-Grand & Buffon

    CPGE — MPSI → MP* · 2018 — 2020

TEACHING

  • X-HEC Master

    Guest lecturer in machine learning for engineering and business master's students.

  • Lycée Saint-Louis

    Oral examiner in mathematics for CPGE students.

LANGUAGES & OFF-DUTY

French bilingual · Arabic (Levantine) native · English bilingual · Spanish intermediate

  • MMA
  • Puzzles
  • Violin
  • Climbing

// forward_pass · selected_work

Projects

Personal builds with demos — RL, games, VR, geometry, and things you can actually open.

open source · RL · mujoco

DropRL — bipedal locomotion

Ray RLlib training pipeline, MuJoCo simulation, and policy export for Cassie-style locomotion.

PythonRay RLlibMuJoCo
Problem
Teach a Cassie biped to walk in simulation without rewarding ugly, fragile tricks.
Built
Ray RLlib training pipeline, MuJoCo simulation, and policy export for Cassie-style locomotion.
Hard part
Distributed RL training where the number can improve while the gait still looks wrong.
/ droprl · cassie · RL debug signals synced to video t = 0.0s
/ droprl · snake-v0 · AI agent iter 0 · reward 0.0 · score 0
star · policy π

capstone · VR · robotics

Affective Touch VR

Unity VR scene with Franka Emika arm control over Redis, hand-tracking calibration, and in-headset pleasantness/intensity ratings.

UnityVRFrankaRedis
Problem
Run a psychophysics experiment where a virtual brush stroke still feels tied to a real robotic setup.
Built
Unity VR scene with Franka Emika arm control over Redis, hand-tracking calibration, and in-headset pleasantness/intensity ratings.
Hard part
Keeping VR rendering, Franka/haptic hardware, and subjective response capture in sync without breaking immersion.

Unity · VR · Franka Emika · Redis · Hand tracking

Touch simulation

Affective touch simulation

Paintbrush stroke rendered on a virtual hand — the core stimulus of the experiment.

Protocol & interface

Calibration

Hand-tracking calibration

Participant confirms poses and follows a countdown before trials begin.

In-VR ratings

Pleasantness & intensity UI

Sliders adjusted entirely inside VR — no headset removal between trials.

2d game · Unity · ADHD

Therapeutic puzzle game — exploration/exploitation study

Unity hub world with four puzzle mini-games, pixel-art rooms, fog-of-war, grid logic, and physics puzzles.

UnityC#Game design
Problem
Build a playable prototype to study how children with ADHD move between exploration and exploitation.
Built
Unity hub world with four puzzle mini-games, pixel-art rooms, fog-of-war, grid logic, and physics puzzles.
Hard part
~200 hours solo: game loop, level design, mechanics, lighting systems, and user-testing build.

Hub & level select

Temple hub with statue landmarks and portals to each mini-game.

Solo development · ~200 hours · designed for ADHD user testing

4 mini-games

Color-coded crate puzzle

Grid-based Sokoban: push crates onto matching colored altars.

Fragile bridge pathing

Numbered crates mark how many tiles you can cross before the floor gives way.

Ice sliding & hazards

Frictionless ice movement with spike traps between start and goal.

Flashlight labyrinth

Circular maze with fog-of-war — only tiles near the player stay visible.

game AI · pygame

Noine — Nine Men's Morris

Full game rules in Pygame with a Minimax search bot.

MinimaxPygamePython
Problem
Implement a complete board game with a competent AI opponent.
Built
Full game rules in Pygame with a Minimax search bot.
Hard part
Move generation and search depth across placing, moving, and flying phases.
/ noine · nine men's morris · minimax ·

click a point · form a mill to capture

original Pygame build

Noine board mid-game with orange and white pieces
Mid-game
Noine board during a mill capture
Mill phase

geometry · optimal transport

Convolutional Wasserstein distances

Python implementation of convolutional Wasserstein distances (Solomon et al., SIGGRAPH 2015).

PythonGeometry
Problem
Compute optimal transport distances efficiently on grids and meshes.
Built
Python implementation of convolutional Wasserstein distances (Solomon et al., SIGGRAPH 2015).
Hard part
Structured-domain distance computation without materializing the full cost matrix.

Barycenter demos

2D shape morph

Dots morph into a star via a convolutional Wasserstein barycenter on a 2D grid.

RGB image morph

Per-channel barycenters blend two color photographs while preserving mass structure.

3D voxel morph

Voxelized dinosaur and double-torus shapes interpolate through a lightly smoothed 3D convolutional barycenter.

Surface distribution transport

Heat-kernel Gaussians on a torus blend in 13 discrete steps — endpoints, spread, and merge are all visible.

// latent_space · selected_projects

Side quests with teeth

Smaller repos — ideas I ran until they worked, without dressing them up as products.

// render_pass · first_obsession

3D Creations

Early 3ds Max render experiments, age 10 to 15.

As a kid I was fascinated by Pixar — the worlds, the characters, the lighting. I wanted to make things like that, so around age 10 I started teaching myself 3ds Max as a hobby and kept at it until I was about 15: rooms, characters, little scenes like the ones below. I never turned it into a career, but it was my first long obsession with building something from nothing on a screen.

// output_layer · return

Contact

Projects, systems, weird technical puzzles.

Say hello

Use the form below, or reveal the address and email me directly.

Send a message

Sent from the site — no mail client needed.