Monte-Carlo-Sim

Some interesting examples of Monte Carlo simulations performed with CUDA Python/CuPy in Google Colab. The notebooks are authored by Onri Jay Benally with citations, if relevant.

No need to download anything manually. Just run the notebooks.

Click here to render the notebooks in the browser:

Below are some examples pulled from the Colab notebooks:

Josephson Junction Quantum Tunneling Prediction

3D Ion Beam Etching Simulation

2-D Heat Equation

Egg White Resist Electron-Beam Penetration Simulation

Terabyte-Level L1 Cache Prediction

Semantic Shift Simulation

GPU Library Comparison Cheat‑Sheet for Google Colab

This table summarizes practical differences when using CuPy, CUDA Python (Numba / NVIDIA cuda‑python) or Julia CUDA on Colab’s three main GPU options.

Free GPU Tier (Tesla T4, 16 GB)

Aspect	CuPy	CUDA Python (Numba / `cuda-python`)	Julia CUDA
Pre‑installed?	Yes `cupy‑cuda12x ≥ 13.3` already in Colab base image	Yes Numba present, but PTX version may lag driver	No Julia kernel optional; add with `Pkg.add("CUDA")`
One‑liner setup	(usually none) `# upgrade only if needed` `pip install -q --upgrade cupy-cuda12x`	`pip install -q --upgrade "numba[cuda]"` plus env vars if NVVM not found	`using Pkg; Pkg.add("CUDA")`
Kernel authoring style	NumPy‑like array ops; optional `RawKernel` / `cupyx.jit` for custom GPU code	Full control: `@cuda.jit` on Python or embed PTX/CUDA C strings	Full control: `@cuda` kernels in Julia
Library coverage	cuBLAS, cuFFT, cuSOLVER, cuSPARSE, NCCL, cuDNN	You invoke CUDA libs manually or via `numba.cuda` driver calls	Julia wrappers for BLAS/FFT/DNN; high‑level `CuArray` API
Typical speed‑up vs NumPy	20–60× for vectorized math	Similar if kernels tuned; launch overhead on tiny arrays	Comparable; sometimes +5 % from LLVM optimizations
Common pitfalls	Duplicate wheels (11× & 12×) break loader; OOM at 16 GB	`CUDA_ERROR_UNSUPPORTED_PTX_VERSION` after Colab image update	First run pre‑compiles packages (30–60 s)
Best‑fit workloads	Drop‑in acceleration for array algebra, FFTs, ML inference	Custom Monte‑Carlo, stencils, irregular memory access	Native Julia data/ML pipelines needing GPU

Paid GPU — NVIDIA L4 (Ada Lovelace, CC 8.9, 24 GB)

Aspect	CuPy	CUDA Python	Julia CUDA
Wheel / package	`cupy‑cuda12x ≥ 13.3` ships fatbin for SM 89	Numba ≥ 0.61 required for SM 89 PTX	`CUDA.jl` auto‑detects arch
Precision extras	FP8 tensor‑core matmul via `precision="fp8"`	Need inline PTX / CUTLASS kernels for FP8	`allow_fp8!()` (CUDA.jl 5.1+)
Memory mgmt	Pool hides `cudaMalloc`; 24 GB ceiling	Manual or managed; same ceiling	Automatic through Julia runtime
Perf vs T4	~3 × on dense matmul / conv	Similar once tuned; fewer SMs can limit occupancy	Similar to CuPy
Limitations	BW ≈ 300 GB/s (PCIe 4.0), not HBM	Same bandwidth cap	Same
Colab cost (Pro/Pro+)	≈ $0.48 hr⁻¹ (4.8 CU hr⁻¹)	idem	idem

Paid GPU — NVIDIA A100 40 GB (Ampere, CC 8.0, HBM2e)

Aspect	CuPy	CUDA Python	Julia CUDA
Wheel / package	Same `cupy‑cuda12x` covers SM 80	Numba ≥ 0.57 supports SM 80	`CUDA.jl` auto
Precision extras	Enable TF32: `cp.cuda.set_matmul_precision("tf32")`	`@cuda.jit(fastmath=True)` -> TF32 tensor cores	`allow_tf32!()`
Memory / BW	40 GB HBM2e, 1.6 TB s⁻¹	same	same
Perf gain vs T4	10–15 × on GEMM / conv	Similar after occupancy tuning	Similar
Session cost	≈ $1.18 hr⁻¹ (11.8 CU hr⁻¹)	idem	idem
Caveats	Limited availability; CU burn fast	Kernel launch + compile time higher	Initial package compile time

Quick Recommendations

Start with CuPy for anything expressible as NumPy/SciPy—lowest friction, high speed.
Use CUDA Python only for the hotspots that need bespoke parallel patterns; stay on the latest Numba.
Prefer Julia CUDA if your workflow is already in Julia—performance parity with cleaner syntax.
Choose GPU by memory and budget: Free T4 for prototyping; L4 for moderate models with FP8; A100 when you need 40 GB or TF32 accuracy.

Yes = works out‑of‑the‑box No = requires explicit install / setup

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
2D_Monte_Carlo_Simulation_of_Ion_Beam_Etching.ipynb		2D_Monte_Carlo_Simulation_of_Ion_Beam_Etching.ipynb
CUDA‑Accelerated_2_D_Heat_Eq_with_Adjacency_Plus.ipynb		CUDA‑Accelerated_2_D_Heat_Eq_with_Adjacency_Plus.ipynb
CUDA‑Accelerated_Semantic‑Shift_Simulation.ipynb		CUDA‑Accelerated_Semantic‑Shift_Simulation.ipynb
Egg_White_Resist_E_Beam_Penetration_Monte_Carlo.ipynb		Egg_White_Resist_E_Beam_Penetration_Monte_Carlo.ipynb
GPU_Based_Monte_Carlo_Sim_Ion_Beam_Etching.ipynb		GPU_Based_Monte_Carlo_Sim_Ion_Beam_Etching.ipynb
JJ_Quantum_Tunneling_Prediction.ipynb		JJ_Quantum_Tunneling_Prediction.ipynb
LICENSE		LICENSE
Monte_Carlo_Sim_Mechanical_Resonance_of_Eye_and_Pituitary_Gland.ipynb		Monte_Carlo_Sim_Mechanical_Resonance_of_Eye_and_Pituitary_Gland.ipynb
Monte_Carlo_Sim_of_a_200_Year_Lifespan.ipynb		Monte_Carlo_Sim_of_a_200_Year_Lifespan.ipynb
README.md		README.md
Terabyte_L1_Cache_Monte_Carlo_Simulation.ipynb		Terabyte_L1_Cache_Monte_Carlo_Simulation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Monte-Carlo-Sim

Click here to render the notebooks in the browser:

Below are some examples pulled from the Colab notebooks:

GPU Library Comparison Cheat‑Sheet for Google Colab

Free GPU Tier (Tesla T4, 16 GB)

Paid GPU — NVIDIA L4 (Ada Lovelace, CC 8.9, 24 GB)

Paid GPU — NVIDIA A100 40 GB (Ampere, CC 8.0, HBM2e)

Quick Recommendations

About

Uh oh!

Releases

Packages

Languages

License

OJB-Quantum/Monte-Carlo-Sim

Folders and files

Latest commit

History

Repository files navigation

Monte-Carlo-Sim

Click here to render the notebooks in the browser:

Below are some examples pulled from the Colab notebooks:

GPU Library Comparison Cheat‑Sheet for Google Colab

Free GPU Tier (Tesla T4, 16 GB)

Paid GPU — NVIDIA L4 (Ada Lovelace, CC 8.9, 24 GB)

Paid GPU — NVIDIA A100 40 GB (Ampere, CC 8.0, HBM2e)

Quick Recommendations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages