Stop. Go back. This is the wrong way.
If you're running python you basically need a full os.
There are projects that run as an rtos, and in fact I worked on an ml soc that ran Linux, but there are 2 levels here:
-
The ml processing itself, ie the math. This is simple in software and very complex otherwise. The software just says "copy this block and start running a matrix multiply". The hard logic is in moving data around efficiently.
-
The stack. This is high level, python or so, and has graph processing overhead too. This needs a lot of "overhead" by its nature.
In either case, don't worry about any of this, the overhead won't be very noticeable, you'll be cpu gated hard, the main thing is finding an optimized pytorch library.
If you have an amd cpu or somehow have an nvidia gpu in your laptop you might be able to use their pytorch library which would improve performance by roughly 1.5-2 orders of magnitude.
Unfortunately there isn't a pytorch implementation for Intel igpus, but there is an opencl backend for pytorch, and apparently this madlad got it working through opencl on an Intel igpu: https://dev-discuss.pytorch.org/t/implementing-opencl-backend-for-pytorch/283/9
But don't worry about overhead, it's less than fractions of percents in these kinds of tasks and there are ways to bypass them completely.