What prompted the OP was the realisation that this is now going beyond third party cookies and similar tracking analysis, to the point where private verbal conversations are being recorded and analysed.
I've underlined what I consider to be the key word; private. Is a verbal conversation via an App provider even private any more? Was it ever?
I haven't tested whether a normal conversation using the phone itself is being recorded, or even if a conversation is being recorded when the phone is in a bag or pocket. I don't have the inclination to go to that level of investigation but I honestly do wonder where this will end up.
The issue here is it's not so straight forward, the OP mentions sharing images over whatsapp, and can he definitely rule out any internet searches or similar? Most of my targeted advertising comes from web searches, youtube video's I'm watching and online shopping information.
I actually think that realtime image and voice analysis is probably being used a lot by companies such as facebook, this isn't necessarily in breach of their T&C's, for example note the following wording:
Under "Information you and we share":
You share your information as you use and communicate through our Services, and we share your information to help us operate, provide, improve, understand, customize, support, and market our Services.
And later on:
However, your WhatsApp messages will not be shared onto Facebook for others to see. In fact, Facebook will not use your WhatsApp messages for any purpose other than to assist us in operating and providing our Services.
Now a lot of the 'information we share' section tries to sound nebulous but just taking the above, nothing there means they couldn't realtime analyse your voice/photos, pick out keywords and store those against your marketing ID for third parties to consume, they wouldn't be passing on your messages/photo's, they would only be passing on information gleened from it..
I don't see much difference to scanning your search terms, shopping habits on third party sites or analysing any image you upload (I swear google do this) and generating marketing information stored against you and most people seem fine with that, why would realtime voice analysis be radically 'different'??