Let's be more precise because I think we're conflating a few things.
Game/engine specific implementation is necessary for this type of AA because it needs a certain level of integration with the rendering engine to be able to access information it needs from the rendering pipeline. It's not like, for example, FXAA which is a post-render shader based AA method which can simply take the raw output of a finished frame and then apply FXAA to it, as if it was just a flat image. With that kind of post processing AA you can write truly generic implementations because all you need access to is the final rendered image of the pipeline which the driver already has, so it can be enforced by 3rd party tools outside of the game. The level of integration required for DLSS and any other non-post processing type of AA is at the engine level. Engines like Unreal Engine now have this integration. It's not asset specific, which means anyone licencing the unreal engine can enable DLSS 2.0 in their game and no game specific driver is required, the drivers can detect the engine in use and do everything driver-side that DLSS requires to function.
On the actual Machine Learning front, which is the training of the algorithm on Nvidias super computer, this is no longer per game, it's a generic DLSS algorithm that applies to all games and so training of the algorithm isn't required on a per game or per asset basis, it can generalize well enough now to give good results on games it's never seen before. Although the more games it trains the algorithm on the better it will generalize and so I expect to make future iterations of the ML algorithm better, new games will be added into the mix. This is expected from any evolving ML algorithm, it's always in a constant state of improvement the more source data you can give it.
Nvidia already have a pipeline for DLSS which evolves past DLSS 2.0 and whose aim is to have DLSS basically running on any game that currently supports TAA, but as with any emerging technology it's requires time to grow and mature, so both quality improves over time and technical requirements drop.
Incidentally, this is partially why I think the IQ will suffer in a globally applicable type of upscaler, because such an upscaler can only really get access to the finished frame output by the engine and work on that, it cannot look at more detailed information from the pipeline such as pixel vectors, temporal information, or subpixel information. This is why all the original AA types before the abysmal post-render shader AA methods actually looked a lot better. Thinking about it from an information theory perspective there's just more raw original information going into things like MSAA with more subpixel samples. And at least with DLSS you have additional information coming from pixel vectors, temporal information and additional information coming from the trained models. If you remove these you're left with an FXAA-like implementation of upscaling, that is to say it's working with just the output image and no additional information, in a process designed to add additional information. This is why I tend to think of this in simple terms as like a photoshop filter.
Of course what would be ideal is if AMD nailed that process and did something truly unique and innovative which benefited everyone, caused more competition in the space and made games look better while running faster. That'd be amazing and I'd happily eat my shoe if they did, but there's some underlying principles here which I don't think can really be violated which make me skeptical about how this will turn out.