Note, this is just how I've understood it for the CRT, I hope it's correct and that it applies to LCD, too:
Hz = "times per second"
FPS = frames per second
So if a monitor is operating at 60Hz, it means it refreshes the image 60 times per second. The computer can calculate more than 60 frames per second (but it can also calculate slower, duh). There's a buffer in the GPU, which the computer is calculating all the time, and refreshing as more information/data becomes available. Now imagine if the computer can calculate 20% faster than what the monitor asks for more data? Once the computer has calculated one frame, it will start to calculate the next. It manages to calculate 20% of the new frame, but then the monitor asks for a new frame (or more accurately just reads the buffer). The GPU has 20% of the new frame, but the rest isn't finished. The rest of the frame in the buffer is still from the previous frame. So it's 20% new frame, 80% old frame. But in a fast moving motion (like panning the view in Battlefield from left to right) the old frame is from a slightly different viewpoint, and the target's head might be 10cm to the right, while the rest of the body is still on the left side of the screen. This is seen as "tearing". (I think it should be possible for lower fps count, too.)
V-Sync forces the GPU to only send complete frames for each separate refresh. So no tearing.
PS. Though I'm not sure whether V-Sync also caps the calculation...? It might still calculate in the background, and just pick the "complete" frames to display. Someone more knowledgeable should chime in with regards to this. Triple buffering is a similar technique, but slightly more advanced, I think.
PS2. The example above is a simplification of the process. The pipeline will naturally choose which frames to start to calculate, otherwise it would choke on the "unlimited" workload (depending on the engine, of course).