Copyright and IP concerns disappear with an open dataset.
i don’t think i’d agree with that, doesn’t matter if dataset goes open if content went there without consideration for authors
also even things like thispersondoesnotexist were used to mass-create fake identities and such
the problem with that is that training can’t be done “immediately” it takes tons of compute