Skip Navigation

Technology @lemmy.world Sapphire Velvet @lemmynsfw.com 11 mo. ago

Child sex abuse images found in dataset training image generators, report says

arstechnica.com Child sex abuse images found in dataset training image generators, report says

Stable Diffusion 1.5 reportedly “tainted” by more than 1,000 child abuse images.

Child sex abuse images found in dataset training image generators, report says

The report: https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_csam_report-2023-12-20.pdf

Technology @lemmy.zip BrikoX @lemmy.zip 11 mo. ago

Child sex abuse images found in dataset training image generators, report says

arstechnica.com /tech-policy/2023/12/child-sex-abuse-images-found-in-dataset-training-image-generators-report-says/

You're viewing a single thread.

12 comments

Between 0.00002% and 0.00006%
- Anything > 0 is too many.
  
  While I agree with the sentiment, that's 2-6 in 10,000,000 images; even if someone was personally reviewing all of the images that went into these data sets, which I strongly doubt, that's a pretty easy mistake to make, when looking at that many images.
  
  “Known CSAM” suggests researchers ran it through automated detection tools which the dataset authors could have used.
  
  They're not looking at the images though. They're scraping. And their own legal defenses rely on them not looking too carefully else they cede their position to the copyright holders.
  
  Technically they violated the copyright of the CSAM creators!

You've viewed 12 comments.