I'll caveat this by saying that I am an AI programmer and engine/physics stuff is not my area of expertise, but....:
You typically have several different representations of the world in a game. You've got a game object world that describes where all the entities are and what they consist of, this lives in system memory. You've got a physics world which contains all the collisions meshes, also system memory, and you have the rendering world, which lives in GPU memory (there'll be other stuff like audio etc). You update the game world and physics world in parallel on the CPU (which various scheduling shenanigans to make sure things are calculated in sensible orders), and then once you've calculated where everything should be drawn, you fire a load of render commands at the GPU so it can update it's state and then draw the frame.
These days you do have compute access to GPUs, but it's not really that straightforward on PC to take say, the physics world, and farm that data off to the GPU to let it do some of that work, and then pull the data back into system memory so the game logic can do physics queries etc. It's not really straightforward. Not to say it's not do-able....
I think it's more a case that no-one (so far) is willing to risk the millions of pounds required to fund a project to create a voxel-based FPS.
The thought of handling multiplayer in a game like Teardown brings me out in a cold sweat though tbh, managing the data to represent the world at a good framerate on a local single player game is challenging enough, without then trying to stream all that state back and forth through a network pipeline that is the digital equivalent of a soggy paper straw