Anita Bryant Dead
drhead [he/him] @ drhead @hexbear.net Posts 2Comments 135Joined 5 yr. ago
![drhead [he/him]](https://hexbear.net/pictrs/image/1702fb2f-2703-44ce-9d31-5c5c45eb339c.png?format=webp&thumbnail=128)
There's usually going to be a hegemonic style for AI art, since for most people making this stuff they're just going to put some vague keywords for a direction of the style then stuff the rest of the prompt with quality keywords. Often times hosted inference services will actually do the quality keyword stuffing for you or train in a house style. Whatever you don't specify is going to be filled in with essentially the model average (which is, of course, not going to be a representative average image, it's going to be the average of the "preferred" set for their preference optimization training). Practically nobody asks for mediocre images (because why would you), and people making models especially on hosted services often effectively won't let you.
Think of what you'd expect to get from requesting an image of "a beautiful woman". There's certainly a lot of different ideas that people have of which women are beautiful and what traits make a woman beautiful, across different individuals and especially across different cultures and time periods. But if you take a set of every picture that someone thought of as having a beautiful woman in it, and look at the mode of that distribution, it's going to settle on conventionally attractive by the standards of whatever group is labeling the images. And the same thing will happen with an AI model, training on those images labeled as "a beautiful woman" will shift its output towards conventionally attractive women. If you consider it as a set of traits contributing to conventional attractiveness, then it's also fairly likely that every "a beautiful woman" image will end up looking like a flawless supermodel, since the mode will be a woman with all of the most common traits in the "a beautiful woman" dataset. That often won't look natural, because we're not used to seeing flawless supermodels all of the time.
That's more or less what is happening when people make these AI images, but with the whole image and its style. The set of images labeled as "high quality" or whatever quality keyword, or that are in their preference optimization set, have attributes that are more common in those images than they are in other images. Those attributes end up becoming dominant and a lot of them will show up in a generated image stuffed with quality keywords or on a heavily DPO-tuned model, which may look unnatural when a typical good-looking natural image may have only a few of those traits. And the problem is exacerbated by each model having its own default flavor, and people heavily reusing the same sets of quality keywords, and I would honestly fully expect that I could pin part of it on how some text encoders work (CLIP's embeddings are hard to separate distinct concepts from and this does manifest in how images are generated, but a lot of recent popular models don't use CLIP so this doesn't necessarily always apply).
Well, it was true for the first big models. The most recent generation of models do not have this problem.
Earlier models like Stable Diffusion 1.5 worked on noise (ϵ) prediction. All diffusion models work by training to predict where the noise is in an image, given images with differing levels of noise in them, and then you can sample from the model using a solver to get a coherent image in a smaller amount of steps. So, using ϵ as the prediction target, you're obviously not going to learn anything by trying to predict what part of pure noise is noise, because the entire image is noise. During sampling, the model will (correctly) predict on the first step that the pure noise input is pure noise, and remove the noise giving you a black image. To prevent this, people trained models with a non-zero SNR for the highest noise timestep. That way, they are telling the model that there is something actually meaningful in the random noise we're giving it. But since the noise we're giving it is always uniform, it ends up biasing the model towards making images with average brightness. The parts of the initial noise that it retains (since remember, we're no longer asking it to remove all of the noise, we're lying to it and telling it some of it is actually signal) usually also end up causing unusual artifacting. An easy test for these issues is to try to prompt "a solid black background" -- early models will usually output neutral gray squares or grayscale geometric patterns.
One of the early hacks for solving the average brightness issue was training with a random channelwise offset to the noise, and models like Stable Diffusion XL used this method. This allowed models to make very dark and light images, but also often made images end up being too dark or light, it's possible that you saw some of these about a year into the AI craze when this was the latest fad. The proper solution came with Bytedance's paper ( https://arxiv.org/pdf/2305.08891 ) showing a method allowing training with a SNR of zero at the highest noise timestep. The main change is that instead of predicting noise (ϵ), the model needs to predict velocity (v), which is a weighted combination between predicting noise and predicting the original sample x0. With that, at the highest noise timestep the sampler will predict the dataset mean (which will manifest as an incredibly blurry mess in the vague shape of whatever you're trying to make an image of). People didn't actually implement this as-is for any new foundation model, most of what I saw of it was independent researchers running finetune projects, apparently because it was taking too much trial and error for larger companies to make it work well. actually this isn't entirely true, people working on video models ended up adopting it more quickly because the artifacts from residual noise get very bad when you add a time dimension. A couple of groups made SDXL clones using this method.
The latest fad is using rectified flow which is a very different process from diffusion. The diffusion process is described by a stochastic differential equation (SDE), which adds some randomness and essentially follows a meandering path from input noise to the resulting image. The rectified flow process is an ordinary differential equation (ODE), which (ideally) follows a straight-line path from the input noise to the image, and can actually be run either forwards or backwards (since it's an ODE). Flux (the model used with Twitter's AI stuff) and Stable Diffusion 3/3.5 both use rectified flow. They don't have the average brightness issue at all because it makes zero mathematical or practical sense to have the end point be anything but pure noise. I've also heard people say that rectified flow doesn't typically show the same uniform level of detail that a few people in this thread have mentioned, I haven't really looked into that myself at all but I would be cautious about using uniform detail as a litmus test for that reason.
Brian Thompson simply had a flare-up of a very unfortunate back condition. Very unfortunate. It may have been preventable.
From what I've seen, him playing with the squirrel with some very convenient camera work, and while he's clearly wearing some very supportive underwear.
they are... gin by itself just tastes like plant water though
The executive branch could absolutely unilaterally cut off support to Israel. We already have laws that prohibit arms transfers to countries interfering with USAID operations, and we're signatories to treaties that prohibit arms transfers to countries if we reasonably believe they will be used in the commission of war crimes. The easiest one for the president to prove would be the former, since we literally have reports from USAID saying this is happening. It's also worth noting that we have treaties obligating us to provide certain amounts of aid to Israel, but enforcing these laws is the sole responsibility of the executive branch. Biden could choose to cut off arms transfers at any time, and if someone wants to argue that our obligations to provide aid for Israel supersede international treaties they can let the courts sort it out.
Do people not already reply then block to get the last word? (genuine question, I do not use twitter, but I know people do this on Reddit a ton)
The only hoop you have to jump through is using a Nitter instance. And the most dangerous abusers are most likely going to be determined enough to where doing this or creating a new account is not a deterrent.
False security is worse than no security. If people trust that the block function is reliable at stopping people from seeing your posts, and then those people post things publicly that they wouldn't share otherwise, that is leaving more people vulnerable than having no way to stop people from seeing your posts.
I literally have been using the majority of my spare time to work with AI-generated images for almost two years now. I have a very thorough understanding of what exactly you'd need to pull off a stunt like this.
The background is part of the image, the obviously given clothing is part of the image, both of those things are fairly consistent across all of the images and look like what would be used for facial recognition, which is something that we know most countries do when they have the technological means to do so, China included. If you want that consistent background and clothing, it needs to be part of the training images. Otherwise, your next best option is a lot of tedious manual editing, which would be more effort than it is worth if the images are to look plausible.
I also have looked at the images myself, and vividly remember GenZedong trying to point out skin lesions as proof that an image is AI generated (definitely not their proudest moment, though they may have thought otherwise). If you'd like to dig yourself into that hole, then show some examples. Most that I've seen pointed out can be more easily explained as skin lesions, markings on the background wall, something moving when the picture is taken. This is what real NN artifacts look like, I never saw anything like these in those images, and what I see far more of is consistency in details that neural nets struggle a lot with.
that's penalizing it, though, unless I'm misunderstanding something?
Had to review my notes on discord from when I was initially investigating this.
You'd need to specifically train a model to output images that look specifically like these photos. If they had enough real images of prisoners to even try to finetune an existing model trained on a broad range of faces, they would have enough real images to make whatever point they're trying to make. That's a mark against these photos being synthetic on practical grounds, in that there is no point in using synthetic image generation to inflate the count.
That database has around 2800 images on it. If we're proposing that a substantial portion are synthetic, then that leaves only a couple hundred that could be used to actually train, which isn't enough, you would severely overfit any model large enough to generate sufficiently high quality images. And the images shown are clearly beyond something like the photos on thispersondoesnotexist. Everything in the background of all images shown, for example, is coherent, including other people in the background. There are consistent objects across different pictures - many subjects were having pictures taken on the same background, and many have similar clothing. The alleged reason for these pictures is facial recognition (which is entirely believable since yeah, China does that, as does everyone else, and isn't notable), having dark clothing on hand to ensure contrast makes sense, as does taking pictures in the same spot. This is all another mark against the photos being synthetic, on the grounds that even current image generation technology can't fully do what is shown in these pictures to the same degree. "But they have special technology that we don't--" no, we have no reason to believe they do, this is unsubstantiated bullshit. Higher quality models generally are larger and require even more data, which would just get you an overfitted model faster with your few hundred photos.
The only thing they really directly claim that these photos are is photos used for facial recognition. They show that at some point, Chinese police took photos of about 2800 people in Xinjiang, which isn't surprising at all and doesn't really prove much. That won't stop them from trying to portray it as proof of an ongoing genocide, though, especially when they know that like 90% of people won't question it at all. The base unit of propaganda is not lies, it's emphasis. The most plausible explanation is that the photos are real, but are being misrepresented as something unusual.
FTL: Faster than Light, 12th anniversary - New General Megathread for the 14th-15th of September
If we're talking about FTL, might as well mention Multiverse: https://subsetgames.com/forum/viewtopic.php?t=35332
I'm pretty sure this outright has more new content than the base game did.
I remember looking over those at the time, at that time the images both seemed a bit beyond then-current image generation technology and there never really seemed to be a compelling explanation over why "some RFA source went through great effort to fabricate images for this story" is more likely of an explanation than "some RFA source is misrepresenting pictures of what is actually mostly just boring normal prison stuff".
Permanently Deleted
Just something that I feel like I have to remind people of whenever it comes up: mainstream psychology does not recognize porn addiction as a real thing, based on the lack of evidence/lack of consensus to support a consistent diagnostic criteria. The only actually recognized related condition is compulsive sexual behavior disorder, which is not using an addiction model.
I'm quite sure that there has to be at least someone who has problematic pornography use habits which aren't just a symptom of another issue, but without anyone being able to pin down a consistent set of diagnostic criteria, then there's barely any way to identify who those people are separately from people who report it but whose distress is coming from something else. One study done on self-reported pornography addiction found that the strongest predictor was moral objection to pornography, not amount of porn use. Another two studies found that antagonistic narcissism is an even better predictor (might read it when it isn't 3AM). Your analysis is actually touching on this somewhat -- a narcissist's interest in "addressing their pornography addiction" is mostly that they think that it will elevate them above the porn addicts, or whatever other target.
There is a third option you forgot.
One of Ukraine's officials claimed that a Patriot missile took it out.
https://apnews.com/article/russia-ukraine-war-f16-crash-2755b4fd1a5dcf1e95ae975d44427b9b
don't forget the absolutely inexplicable platform-specific bugs:
this is a different attack from the ones people are usually talking about with Assad, that report is about a mustard gas attack (which ISIS has/had access to) and the notable attacks people accuse Assad of perpetrating were sarin gas attacks (which ISIS never had access to afaik)
they'll probably take information representing some aggregation of interactions for a user and make some scoring model that tries to learn from pairs of user data and outcomes (in terms of whether they successfully dated or whatever you do on these apps). 100% marketing bullshit, doing LLM inference for something like this would have costs spiral out of control FAST. But a scoring model is cheap, they have the data to make one, and it isn't really all that innovative either.
You wouldn't download an overclock.
Well, I had to look up the lyrics for that song... I guess it's better than I would expect from an average country musician but still