They are over simplifying the underlying principles by squishing it all under the moniker of Async compute. You either Run something asynchronous or you don't, it just means that you run something out of order and concurrently instead of synchronous and consecutive to the main flow of the progam.
There are no levels to being Asynchronous, just the amount of work you perform in parallel. whether it is many threads or just a few they are still just Asynchronous, there is no level to them or amount to being asynchronous.
The comparison to hyperthreading revolves around using Bubbles in the pipeline where no work is being computed in some part of the hardware, you aim to slip work into these bubbles which is what hyperthreading and Asynch compute aim to do.
This is going down to the very grit of how superscalar processors perform work, since although it is a single core, they process multiple instructions down multiple pipelines. Usually a single thread will be broken up into multiple parts and fed down these pipelines to improve IPC, but there can be bubbles where no work is being performed, which is where hyperthreading steps in to help fill these bubbles.
This is the comparison with Async, since you are filling bubbles in the pipeline with extra work.
Also that Threading and queue depths comparison was using a very light workload per thread, so the comparison is not as direct. It just shows how many threads the processors can handle concurrently.
The real grit of the async compute is the amount of work do per extra thread you add within those bubbles, which is why they say they are not using many extra threads but still getting a boost.
It starts becoming a detriment when you try to queue up too many heavy threads and exceed the processing power left over by the bubbles, you then start impeding the process which should have been running in that time as you are taking up resources it requires. Which then leads to a detriment in performance as that original process has had to take longer to finish.
So if you are trying to execute compute threads in parallel without managing resources, you end up with an overall detriment.