291
Congress Wants Tech Companies to Pay Up for AI Training Data
(www.wired.com)
This is a most excellent place for technology news and articles.
Face recognition is probably dead as an open endeavor. The surveillance aspect makes it too controversial. I mean that not only will we not see open source work on this, but any work is behind closed doors.
In general, a major problem is that it is often not clear what reducing bias means. With face recognition, it is clear that we just want it to work for everyone. With genAI it is unclear. EG you type "US president" into an image generator. The historical fact is that all US presidents were male, and all but one were white. What's the unbiased output?
One answer is that it should reflect who is eligible for the US presidency. But in the future, one would expect far more people to be of "mixed race". So would that perhaps be biased against "interracial marriage"? In either case, one could accuse the makers of covering up historical injustice. I think in practice, people want image generators that just give them what they want with minimum fuss; wants which are probably biased by social expectations.
In any case, such curated datasets are used to fine-tune models trained on uncurated data. I don't think that is known how such a dataset should look like exactly, to yield an unbiased model (however defined).