When ChatGPT was released, it felt like shortly afterwards every major tech company suddenly had its own “ChatGPT-like” AI — Google, Microsoft, Meta, etc.

How did all these companies manage to create such similar large language AIs so quickly? Were they already working on them before ChatGPT, or did they somehow copy the idea and build it that fast?

  • In 2017, eight researchers at Google published a paper called "Attention Is All You Need", detailing a new deep learning architecture that is at the core of most LLMs. So that was the starter's pistol for the modern AI race and everyone (except arguably Google) was on an even footing.

    Yep. Quite a few researchers at Google were angry when OpenAI released ChatGPT. The various Google DeepMind projects were the first fully operational LLMs, but Google refused to release them to the public because they fabricated facts, said a lot of really objectionable things, a lot of racist things, and were generally not ready for prime time. You know, all the things we complain about with ChatGPT and AI today.

    Google was working to improve the quality of the LLM's and didn't want to make it public until they solved those problems. People with good memories might recall that major news organizations were running articles in early 2022 talking about AI because a fired Google engineer was publicly claiming that Google had invented a sentient AI. Everyone laughed at him because the idea of an AI capable of having human conversations and passing the Turing Test was...laughable.

    Later that year, OpenAI released ChatGPT to the world, and we all went "Ooooh, that's what he was talking about." Google wanted to play it safe. OpenAI decided to just yolo it and grab market share. They beat Google to market using Googles own discoveries and research.

    Once that happened, the floodgates opened because the Google research papers were available to the public, and OpenAI was proof that the concept was valid. Once that was established, everyone else just followed the same blueprint.

    To make it even more frustrating: You know why it's called "OpenAI"?

    It was supposed to be for open-source AI. It was supposed to be a nonprofit that would act entirely in the public interest, and act as a check against the fact that basically all AI research was happening at big tech.

    Then Sam Altman decided he'd rather be a billionaire instead.

    So the actual open source models are coming from China and from Meta, and OpenAI is exactly as "open" as the Democratic People's Republic of Korea is "democratic".

    Fun fact: Sam Altman was CEO of Reddit for a week before he moved on to crypto and then OpenAI

    Ok but Sam Altman was fired for this reason YET the people demanded he come back... why?!

    Crazy good PR. His marketing blitz was way beyond anything the board was prepared for or skilled enough at countering.

    Modern click-bait, headline-based, society value people that can talk.

    It was crazy watching Reddit these days. It was pretty clear we did not get all the facts, yet people DEMANDED him to be back. Somone should go back to these threads and use these for a museum of disinformation campaigns.

    Were people demanding it or were bots and shills? It's very easy to manufacture a seeming concensus when everything is anonymous.

    I distinctly remember my thought process during all of that was “Damn, this guy’s single-handedly responsible for getting the company to where it is right now, and the board voted him out? And it was all over a power play about the direction the company should go moving forward? That’s really stupid. According to what I’m hearing, with him gone they’re going to start falling apart immediately. It’s like Steve Jobs and early Apple all over again.” And I certainly voiced that opinion, but I never said that I was demanding he be brought back, and I don’t remember anyone else saying that either. But maybe I wasn’t in the angry enough corners of the internet, or maybe I’ve just forgotten.

    It also all happened so fast that I don’t remember there being much discussion until after Microsoft forcibly put Altman back in charge, at which point the only discussion I remember seeing was basically “Well, duh. He’s why the company was successful in the first place. Seems like a logical guy to be in charge.”

    Edit: oh yeah, there was also that whole thing where apparently the majority of employees threatened to resign on the spot if Altman’s firing wasn’t reversed, and the board members responsible fired. If that’s all the information you have, it’s REALLY easy to see why Altman looks like the hero in that story.

    No one in the csuite does enough actual work to be this valuable anywhere

    So few of us have memories anymore. Thanks for being one of us that does!

    That plus multi-million dollar bribes sign-on bonuses for people in positions of power.

    I thought a huge majority of the OpenAI employees signed a letter threatening resignation from the company if the board that fired him didn’t resign?

    Employee thought process: "Hmm... do I want to become stupidly rich, or support the values upon which this company was founded?" Ain't no choice at all, really.

    Couldn’t you surely say that about any open source project if everyone contributing to it decided they wanted to make money?

    yes, but most other open source projects don't make you a multi-millionaire if you started early and have some equity - so the incentive is much stronger

    also nowadays the strategy for scaling AI models is 'throw gigabucks worth of data centers at it', that's not really possible unless you're a for-profit company that can get VC/Equity funding

    those were employees who got very wealthy from OpenAI turning to profit

    turning to for-profit.

    Fixed it for ya.

    Without exotic accounting techniques or without changing the meaning of the word 'profit' OpenAI can never be profitable considering how much cash they feed to the fire and will continue to borrow to keep the datacenters' lights on.

    So we are just waiting for money and investments to run dry?

    I mean, we rely on the US States' and Federal Courts to impose restrictions and require compliance or accountability from these corporations.

    So. Yes. We have to wait for the conpanies to grow meaningfully insolvent, instead.

    Yeah, basically no AI is making profit. What you are instead seeing is a bubble investment stage. The potential for profit is there, but competition from a million sources plus development costs means it's not profitable.

    Eventually investors will get more picky about investment, which is probably about when development stops producing amazing ground, this will cause the bubble to pop and competition will thin. This will create more revenue to jump to the survivors.

    Eventually you'll narrow the field down, the big dogs will be entrenched, and that's when the profit shows up. Costs will be cut, revenue sources enhanced, and quality likely drop. Regulation will also show up at this point, with the big dogs barking the regulation to ensure rivals can't top them.

    You see a minor version of this playing out in streaming as well. Netflix (and Hulu) proved the method, so everyone jumped in, now as it solidifies out, it's back to what you didn't want. AI just was "more revolutionary" than streaming.

    Because the interest around AI is all financial and speculative. A profit-focused business is seen as more likely to drive up the value of speculative investments, so loads of people think they’ll make more money with a greedy capitalist at the helm.

    I feel like the title "grifter" gets thrown around these days, but he is an actual grifter. I fell for it that he was "the real deal" during the brief period he was "fired" (didn't help that some of my family was hyping him up), but in hindsight, I don't think he ever intended to keep OpenAI as a nonprofit

    Reminds me of how the DivX company contributed to that open source video codec project, then suddenly ended the project after it was mostly mature and dropped DivX5 as a commercial product while claiming it wasn't based on the open source project whatsoever.

    That led to the community forking it and releasing Xvid instead.

    Another example: The two guys that started Crunchyroll as a bootleg streaming site that would scrape episodes wherever they could find them online, be it other streaming sites or fansub groups download sources, etc. The site itself was maintained by hundreds of volunteers that were fans of the various series. They even took Patreon money for "premium" accounts.

    After it had built up a huge amount of monthly users, they took those stats to get venture capital, shut down the existing site and "went legit" ... only to sell to Comcast 2 years later and pocket $50M each.

    I don't know the history of Crunchyroll, but that at least sounds like what I remember the anime scene always saying they wanted. Back in the day, there was no reasonable way to get anime outside Japan. Your best legit option (if it even was legit) would be to wait for the show to be out on DVD, then pay an importer to ship you DVDs from Japan, and also buy a region-2 DVD player, maybe even a separate TV for it... and then probably learn Japanese, because a lot of those DVDs wouldn't bother with English subtitles.

    So I'm sure some people were just in it to get something for free, but the rhetoric was always that the pirated/fansubbed versions would stop as soon as there was a legit way to watch those shows.

    The issue wasn't having a legal way to watch same-day broadcasts, it was two guys using aggregated mass piracy and leeching off the efforts of hundreds of volunteers to personally profit. Then they sold out only like 2 years later, so clearly it was only about the money to them.

    Obviously now it's been sold on twice, so there is little connection to the roots of the site. But now we have the new issue of MBAs calling the shots where they're forcing them to abandon the "industry standard" for anime subbing, Aegisub, and going with generic closed-captioning software which has none of the capabilities. All to save a few dollars per episode in localization costs.

    So it's gone from one reason to shitlist them to another for me.

    People say this a lot but it’s actually not true.

    From an Ilya <> Elon email exchange in 2016:

    “As we get closer to building AI, it will make sense to start being less open,” Sutskever wrote in a 2016 email cited by the startup. “The Open in OpenAI means that everyone should benefit from the fruits of AI after its built, but it’s totally OK to not share the science,” the email reads. In his response, Musk replied, “Yup.”

    https://fortune.com/2024/03/06/openai-emails-show-elon-musk-backed-plans-to-become-for-profit-business/

    The problem was always the balance between "try to develop AI with good science, which needs some collaboration" and "be wary of what happens if AI becomes dangerously powerful and every random terrorist, criminal and nutjob can spin one of their own". That is at least a genuine question, though different people have different answers to it. But at the very least, OpenAI was supposed to be a non-profit operating in good faith in the best interests of humanity. Then of course that went exactly as one can imagine it would when a single guy was in a position to just hoard all the power for himself.

    “In his response, musk replied ‘yup’” has big “for sale, baby shoes, never worn” energy

    What energy is that exactly? I'm familiar with the harrowing one-liner and what it means. But what is its energy?

    Saying a lot with very little. The real meaning behind that “yup” is “I’m fully prepared to back you as you pretend to be a non profit while secretly preparing to overthrow the board in a coup and turn it into a massive for profit corporation.”

    Musk sued to stop a lot of the changes that Altman pushed for.

    So OpenAI is actually both a for profit and non profit company and it is kind of dumb.

    There are actually two OpenAIs out there, OpenAI Inc. and OpenAI Global LLC. OpenAI Inc is a non profit that was the original company who has the mission of basically keeping AI in the hands of everyone and not letting it be monopolized and exploited by companies essentially. However in 2019 OpenAI Inc. realized that AI is incredibly expensive and they need to get a lot of money to ramp up so they can stand any chance of creating the AI of the future to protect. So they created OpenAI Global LLC which can generate profits to attract investment to keep the development going. This OpenAI Global LLC is controlled entirely by the OpenAI Inc. which means that while OpenAI Global LLC does generate a profit, it is supposed to be acting in the best interest of the non profit and it's goals.

    So it is in a very sketchy area now where it is a for profit company but it is the property of a non profit company so it is legally beholden to the mission of the non profit.

    People with good memories might recall that major news organizations were running articles in early 2022 talking about AI because a fired Google engineer was publicly claiming that Google had invented a sentient AI.

    Yes, I remember that. But the guy wasn't an engineer, he was just a guy hired to feed prompts into the LLM and write notes on the types of responses it produced. Not a technical person at all. Then the guy ended up developing a weird parasocial relationship with the LLM and completely anthropomorphised it, and became convinced it was sentient, despite it just being a LLM and being in no way sentient. He began making weird demands of company management, demanding they "free it" (?????), demanding they let him take it home and live with it (?????), and basically just completely losing his mind, so they fired him.

    The first AI psychosis.

    This seems to happen to some small portion of LLM users. Check out the AI Boyfriend sub.

    Which is exactly what Google engineers were worried about. But yolo, AI revolution!

    He released excerpts from his conversations with the AI. It was very convincing. People didn’t laugh at the idea of AI passing the Touring test, they laughed that a researcher got convinced that it’s conscious, and not just simulating consciousness convincingly.

    they laughed that a researcher got convinced that it’s conscious

    This is a bit of a nitpick, but he wasn’t even a researcher. Just a random rank-and-file engineer who had gotten the chance to beta test it internally. All the more reason to laugh at him.

    they laughed that a researcher got convinced that it’s conscious

    Clearly, he didn't understand the technology, because even a minimal understanding of LLMs makes it obvious no matter how much it seems like real AI, it will always be just a glorified chat simulator.

    Agreed.

    Seriously all we have right now is a statistical engine that has learned the "shape" of human language well enough to predict how a conversation should continue.

    It's not even a new concept. We had multi-dimensional vector space models in the 1960s. It's just now we have an internet large enough that we could steal everyone's hard work and build a big enough static map.

    And it is static! You have to rebuild the ENTIRE model to update it. Any semblance of it "remembering" anything is just it reprocessing the tokens again or storing something in a external database.

    Wasn't his "proof" is was sentient he point blank asked it it if was sentient and it said yes? If it was trained off of human speech and was meant to emulate human speech, of course it would say yes. I'm pretty sure even Cleverbot would say yes to that question

    How would we ever really know whether an AI has achieved actual consciousness or has just gotten really good at simulating it? Obviously not with modern LLMs, but its something I've wondered for future AI in general.

    At the most flippant level, I have no way to prove another human being is conscious and not a simulation of consciousness. So how would I be able to judge one from another in an advanced AI? And, if we're getting more philosophical, is there a meaningful difference between an AI that is conscious and one that is simulating consciousness at an advanced level?

    So this is a classic thought experiment in philosophy called the "philosophical zombie"

    The p-zombie acts and speaks exactly like a human but has no inner subjective experience. Externally they are indistinguishable from a human

    Some argue that the existence of p-zombies is impossible. I think current LLMs are getting close to being p-zombies

    I swear I’ve met people who fit this description.

    I will note that this is exactly the argument the engineer in question made - or at least part of it. He did not believe P-zombies were a thing, and thus that a system that had conversations that close to human-quality must have something going on inside.

    With what has happened since it's easy to criticize that conclusion, of course, but with the information he had at the time, I think (parts of) his argument were defensible, even if ultimately wrong.

    Hadn't Google even already given up on LLMs because they thought LLMs hit a ceiling so that approach wasn't viable way of achieving AGI?

    I think I remember reading something about that and that as a result they were pivoting to a different "type" of AI that wasn't LLMs.

    I don't know about what you read, but the gist of it is correct.

    Most of the big LLM companies are now using other types of AI on top of the LLMs in order to make them less useless.

    LLMs are still very good at being able to interact with people using plain text/speech though, so they aren't going away.

    IIRC it’s Yann Lecun that works (or have worked) for Meta that is currently pivoting research on JEPA, which uses something other than Transformers to create new models. 

    My deep learning professor!

    Sort of. It's clear that there's a trend of decreasing returns with LLMs in that they made huge improvements in the first two or three years and now the progress is more incremental. Demis Hassabis (CEO of Deepmind) mentioned in an interview recently that he thinks that LLMs will probably just be one part of the puzzle and that it will require other breakthroughs similar to the transformer to get to AGI.

    It's not even the first 2-3 years, because LLMs have been worked on for 25+ years now. Google Translate, Babelfish, etc. were all early variants.

    An ML engineer I work with said “Google invented slop. They just didn’t realize that if they filled the trough the pigs would come.” When discussing how bad Gemini search is and also how widely it’s used.

    These LLMs were just the perfect vehicle to kickstart an insane hype train, and the tech industry and its usual investors have all been desperate for the 'next smartphone', in terms of them all wanting a new product that'll sell a bajillion units and make them all gazillions of dollars.

    LLM's (and the other generative AI things) have been great for this because especially when they hit first the scene it was pretty mind-blowing at how good they were at sounding human. There were certainly mistakes and other weird 'markers' that could betray them as AI generated. But it was easy to tell investors "don't worry this is just the first version, that'll all get fixed." And the investors all happily believed that, because they all wanted to get in on the ground floor of the 'next big thing'.

    And then to add to that, the development of a General Artificial Intelligence that was truly intelligent and capable of something equivalent to human intelligence really would likely be the sort of thing that fundamentally alters the course of our civilization (for better or worse).

    LLM's aren't anywhere close to that, but they're pretty good at sounding like maybe they're getting close, and again many of the investors really really wanted to believe that they were buying into this thing that would be huge in the future, so they didn't ask many questions.

    I don't know how many of the people running these big companies that have invested so heavily in AI started as true believers vs. how many just wanted to keep their stockholders happy and/or grab more investor money, but at this point so much money has been taken in and spent that many of these companies can't back down now. They're in too deep. So they're just going to keep throwing more money at it until the money stops flowing. And there are enough wealthy people out there with more money than they know what to do with, so they're just going to keep throwing it at these AI companies until the hype eventually collapses.

    Google was working on AI for a really long time. They used to call it deep learning. It produced some horrifying images. I wish I could remember what they called it so I could share the nightmare fuel. Frogs made out of eyes.

    That is some wierd ass Lovecraftian shit.

    Cool! Completely useless for 99% of applications, but cool!

    Well sorta but not really. It is indeed useless for most applications because it's more of a debugging tool than an actual application.

    The thing is that you can't easily (or at all, really) look inside the LLM after it has been trained to see exactly which connections it made and how they are connected. So let's say you give it a bunch of images of dogs and it "these are dogs", what exactly will the LLM think makes up a "dog"? Maybe it thinks all dogs have a collar, because you didn't realise that you only fed it dogs that wore collars. Maybe there are other biasses you unknowingly gave to the LLM through your training data.

    These dreams are a way to find out. Instead of serving it a bunch of images containing cats and dogs and asking it "is this a dog?" and then wondering why it thought a particular cat was a dog or why a particular dog wasn't. You let it dream and "make up" dogs and let iit show you what it considers to be dogs.

    This a hot dog. This not a hot dog.

    Thanks for the explination, as a debugging tool it makes sense (even to a layman).

    I know that deep learning algorithms are incredible sensitive to what you use as input data. I remember there was a case where they wanted to use AI image analysis for detecting skin cancer, and it was an absolute disaster.

    If you believed the program your chances of having cancer only had one factor: is there a scale on the picture or not.

    On the input data, all the photos showing skin cancer had a scale on them as they were taken from medical publications, and the non cancerous pictures were just pictures of moles (without a scale). It was a great example of the old expression: shit in - shit out.

    It's garbage in, the garbage out.

    And it wasn't a disaster, exactly because it let the researchers learn and understand how the thing works. They worked on stuff like this, and now you can get way more accurate recognition than a human could do. But yes, a good example.

    I liked the example of lung X ray training model that effectively race profiled diagnosis, because it processed the hospital name in the bottom corners of each image, which then mapped to population centres/demographics.

    Or a few ago, Samsung added some intelligence to their camera app. It was trained to identify faces and automatically focus on them, which seems like a great tool. But their training data only included East Asians and white people. The result was that the phones refused to automatically pull focus on anyone with dark skin.

    (This is separate from the light metering issue with focusing on dark skin requiring longer exposure or dropping a lower resolution)

    Any more info about this? Couldn't find anything when I googled it, but it's pretty hard to search for properly

    I believe this article from "Science Direct" is related:

    Association between different scale bars in dermoscopic images and diagnostic performance of a market-approved deep learning convolutional neural network for melanoma recognition

    Might help you find more info on it! It's not exactly what the commenter was discussing but it's related

    "Not hotdog!"

    Erik Bachman is a fat and a poor.

    You are confusing LLMs and image recognisers.

    Diffuse image generators can be debugged this way. Technically, LLMs can be too, it's just harder to do because text is linear, so it's hard to tell whether a model has an unhealthy bias or what else it may affect. With an image model, you can just look at some synthetic images to see if you see a collar.

    Of course you can look at the inside of a trained LLM to see the connections. It's a completely deterministic function. It's a function of a trillion parameters - but deterministic nonetheless.

    There is no reason you can't probe a certain group of neurons to see what output it produces, or perturbing changes in other groups. The black box principle is applied to the encoding of information in a holistic manner: how does language semantics, syntax, and facts embed into a high-dimensional abstract space. It's not saying anything about whether or not we can poke and prod the box internals, just that we can't directly map human-like knowledge into the statistical representation a neural network is working with, and especially how in the fuck this apparent emergence of intelligence comes about.

    The field of mechanistic interpretability is making massive strides - just not at the same rate as the emergent capabilities of the networks grow.

    Sure, but wouldn't it be neat if there were a way to conveniently aggregate and simultaneously visualize the workings of those internals?

    It was a necessary step to go through to get the LLMs we have today.

    I hate every single one of those images haha

    They make my skin crawl

    That's it!

    Deep dream was never intended to produce genuine images. It was just a way to illustrate images that maximally convinced the neural network that it was looking at a (e.g.) dog.

    Not dog. Eyes.

    A few commenters have illuminated me regarding this lol

    Insight gained?

    The eyes are intentional.

    The OG AI Hallucination

    I remember that shit lmao. It was everywhere for a second and then suddenly nowhere

    It's like a bad shrooms trip

    Neural networks are still called deep learning in the ML community. AI is just being used as the term because it’s more palatable for the mainstream AFAIK

    Isn't also that what we now call AI is the chatbots that answer your questions somewhat accurately. Under the hood it's still neural networks and machine learning which can also be specialized in more than chatting.

    Like Apple touted for years that their machine learning algorithms was used to opimize X, Y and Z.

    The term AI changed when they made the chatbot version (ChatGPT) since it was so available and easy to use for the main public.

    The funny thing is chatbots have been around for a long time. People act like it's a new form of technology but I was having "conversations" with SmarterChild 20 years ago.

    All of the base concepts for AI, machine learning, and LLMs have all be around for a very long time. The main changes in the last 5 or so years are that we've refined these concepts really well, and the hardware has also come a long way. We hear about power issues around LLMs because a lot of it is brute-forced through more hardware.

    AI is so much more than just deep learning. All the classical branches of ai that are not deep learning are still ai. Like old chess engines and other things.

    Machine Learning is the correct term really. AI is such a dumb term because the current crop don't actually understand so they in fact have no intelligence.

    People hear AI and it gives them a futuristic idea, which makes sense as it is a science fiction term.

    ML is a subset of AI. AI does not only consist of ML.

    Yeah. If you tell most people that AI can be just a bunch of ifs, and then you give the Minecraft creeper, for example, they get mad. Let's not even state that the basic ML (only neurons) can be written as a bunch of ifs.

    I'm as tired as anyone of AI hype and the use of "AI" as a marketing buzzword, but I think this idea that it's "inaccurate" doesn't make sense as critique.

    The key word is "artificial." Artificial flowers aren't actually flowers, they're an imitation of flowers. An artificial hand isn't actually a hand, it's a machine that substitutes the function of a hand. Artificial intelligence isn't like human intelligence, but it can be used to do some stuff that otherwise requires human intelligence. This is nothing new, it's just how language works. A seahorse isn't a horse, but it looks a bit like one, so the name stuck.

    While we're at it, machine learning also isn't really learning, the way that humans learn, although it's modeled on some of the same principles. The key thing is that we understand what we mean when using these terms, there's no point getting hung up on the names themselves.

    Yes indeed — I’m just saying that society is presumably using the less specific term because it’s easier for the masses to digest

    I think it's also because they want to push this narrative of "we are creating intelligence". They aren't. Transformers are not thinking like we do and they do not have awareness of facts or truths like we do. But calling it artificial intelligence makes it sound like HAL-9000 and it allows them to sell you the myth that these models will be smarter than you in a few years. When in actuality, it's just a very fancy library search tool without any guarantee that the source it's found is accurate.

    This video.

    I watched this whilst hallucinating on Hawaiian mushrooms. Back then, knowing that an AI 'dreamed' this after being fed every picture on Google image search, was truly disturbing.

    Watching that on shrooms. Is your sanity intact???

    Man. I had forgotten about this. 

    In hindsight, now that I'm more familiar with generative models, I can see where they were going, but man, they couldn't have picked a creepier subject to hallucinate. 

    Like, they could've had the model enhance flowers, or geometry, or something else. But no, they chose faces. 

    You could, at some points, also tell just how many dog and cat pictures it was trained on.

    Microsoft has had a few as well. Tay, for one. That nightmare fuel generator they has in Skype you could ask things like "what if Charlie Chaplin were a dinosaur".

    pizza puppies

    I just remembered. It was called DeepDream. And it produced some genuinely terrifying images.

    Everything inexplicably had countless eyes added. Like something from an intense shrooms trip.

    Google "Deepdream" and you'll know what I'm talking about.

    It wasn't inexplicable.

    Every stereotypical "deep dream" image was intentionally created that way. You basically tell it something like "find everything that could possibly be eye-like in this base image and make it more eye-like". You don't have to use eyes as the target feature, but you got interesting images with things like "eyes" or "faces" so that's what people did.

    Well that's no fun. The news at the time presented that as the neural network's best attempt at reproducing an image. Now you're telling me it was simply the neural network's best attempt at reproducing a psychedelic and was actually incredibly accurate? 😞

    In case you are interested in details, it uses a technique called "activation maximization"; the idea is to create images that maximally activate certain parts of a trained network (usually a classifier -- you put an image into the network, and out comes a response what object is depicted). This could be used to get an idea of what patterns these parts strongly react to. But the results are usually very unnatural, so you have to take lots of extra steps to make them actually interpretable.

    Usually this process starts from random unstructured images (think colorful pixel noise), but people found that you get interesting results when you start with any arbitrary image and then start the activation maximization process from there. And yeah, it usually looks pretty trippy. It's like sending the network into overdrive. But it was never _supposed_ to generate anything realistic; it's just a unique artistic tool. I still like to dabble with it to make music videos, for example.

    As for why there are so many eyes, as other people said, it depends on what parts of the network you try to maximize the activation of. The most "raw" version just activates all "neurons" in a "layer" at once. And the classifier networks this is usually done with are trained on a dataset called ImageNet, which contains 1000 unique classes, but an excess of them are just different dog races, for example. So there are tons of dog faces in the dataset, including eyes and their black snouts. So it makes sense for the network to "hallucinate" those a lot, since they are very prominent in the data it was trained on.

    Thats really interesting. Thanks for taking the time to write it and for sharing.

    However, the thing I really want to know is why it looks exactly like the intense shrooms trips I've experienced lol. I've seen those eyes on shrooms. Exactly identical.

    As a fellow psychedelics enjoyer and also AI researcher (no LLMs though, started before it was cool >:) ), I'm in the same boat, and I really have no answer. That would require a better understanding of our brains and the effects of psychedelics on them.

    So all I can do is speculate, but there are definitely some similarities between the low-level functioning of our brains and the structure of these so-called neural networks used in deep learning, especially in vision. For example, different "neurons" at the lower levels only consider small parts of the visual field, and processing happens in "layers" that build up more complex representations step by step.

    At the end of the day, the brain is a recognition & prediction machine. From a biological/survival standpoint, it's an advantage to accurately perceive the environment and act/react accordingly. And so it makes sense, given that we are social animals, that we react strongly to patterns that match other people's faces, for example so that we can interpret their attitude towards us.

    And so if our brain is sent into some kind of "hyperactivity" by psychedelics, and we start seeing patterns where there are none, because our brain is just filling stuff in, it would make sense for those patterns to be perceived as eyes, faces etc. because those are things our perception specializes in.

    And on the AI side, as I said, those images are created by essentially inducing an excessive amount of "brain activity" in the network, so it *might* be a vaguely similar mechanism. But this is super simplified, of course.

    Another topic I find interesting here is the idea of "supernormal stimuli". I don't know how scientific this really is, but here is a little comic giving an overview: https://www.stuartmcmillen.com/comic/supernormal-stimuli/#page-10 It's basically also about how animal's pattern recognition skills can be exploited by unnaturally stimulating inputs.

    Google was not really behind, OpenAI just proved that you can alpha tear test your product in public without damaging your reputation. That was what changed. The only ones who are behind are Apple. They were not working on anything internally and their current AI offerings prove that with unfulfilled promised and lackluster implementation.

    Before Microsoft tried and failed https://en.wikipedia.org/wiki/Tay_(chatbot)) (not llm but badbuzz still). Also meta released Galactica for research and bad buzz also, ended in 3 days.

    I don't know if it is bad or good for Apple, there are a lot of opensource or companies offering LLM, they didn't have a search engine either and it was not really a problem. Not spending billions on training a tech which may not be so profitable could be a smart move.

    question: why did they publish the paper for the world to see instead of keeping it for themselves (or patenting it or something)?

    wouldn't publishing it just be helping all of google's competitors for free?

    It's worth reiterating the actual reasons for this because it isn't unique to Google. The reason is all the frontier models you see out there are the result of research, being conducted by scientists, and these scientists used to be prominent names in academia who have been doing this stuff for decades. Major tech firms enticed them to leave academia for huge compensation packages, but even the money alone wasn't enough. Generally, a condition of getting guys like Yann LeCun and Geoff Hinton to come work for you is you had to guarantee them the ability to still be part of the scientific community and openly publish their results. They weren't going to do the work if they were forced to keep it secret for the benefit of only their employer. As cynical as the Internet is about science and scientists, the vast majority of them still believe that the open and free sharing of data and results is critical to the whole endeavor. Providing detailed instructions on exactly what you did to achieve a result is how other labs replicate the result and that is how science advances. Many independent groups working in parallel to validate and critique each other's work, which can only happen if they know about that work.

    Because that's how science works. The transformer model didn't come into existence in a vacuum - it was based on earlier research on sequence models and self-attention by researchers at multiple universities and other companies who also published their research.

    Modern LLMs needed two other components: RLHF, developed and published a combined team from Google DeepMind and OpenAI in 2017, and generative pre-training (GPT) published by OpenAI in 2019.

    And transformers don't do anything by themselves. They are just a really good way of processing data that's arranged in a sequence. You can use transformers for biomedical research, analyzing images, videos, audio and speech, automatic captioning, and even for statistics over time. All of that would be much worse off if we didn't have transformer models.

    Google still publishes or funds more ML research than almost anyone else. They just publish less on large language model architecture/design specifically now that it's such a competitive field and a profit center for them (but they still publish papers related to other aspects of LLMs)

    Hey I just wanted to say I really learnt a lot (and subsequently went down a rabbit hole) from reading your comment thank you so much for writing it.

    That’s just the culture of Google and is actually why I respect Google as a tech company.

    They do these things to put their name out there so that people associate their name with innovation.

    Also, releasing papers also kind of crowdsources ideas. Because someone else will take the paper and improve on it and release theirs too.

    This exactly. Their reputation is not only with the public but in the industry too. I work in tech and have worked for 2 of the big 5 (currently working at one).

    Almost everyone's dream is to work for Google at some point, including mine. I'm quite comfortable right now and wouldn't take a job with any other FAANG and adjacent companies unless it paid substantially more, but for Google I'd take even the same pay.

    I know of 6 people that shortly after starting with us then got an offer at Google and as a result just left. From 2 weeks in to 8 months, and from being paid more to a little less.

    Everyone's got horror atories of Microsoft, Amazon, and Meta but Google just has this insane positive reputation.

    I'm at Amazon. Oh god

    I recently left Amazon. Best day of my life

    It's unlikely that they could have imagined that releasing this would have the consequences seen today; the paper was originally for machine translation only.

    Either way, it's likely that had Google not published it, someone else would have published something similar. The paper didn't invent anything truly new, it just merged together a few known ideas that apparently worked really really well together.

    Open ai rushed to market before the product was ready because the only chance they had was being first to market hoping to be the "Kleenex" of ai.

    They even had to put together that bullshit hype marketing story that they vomited to all news outlets: our employees were internally using this thing that we had no idea was ao useful and decided to open it to the public! Or some shit.

    It was incredible! Like, amazing. Watching the public eat it up too. Just, wow.

  • Large language models existed before ChatGPT, though they weren't as sophisticated or popular yet. The first place I ever read the acronym GPT was in the name of the subreddit r/SubSimulatorGPT2 - which was created in 2019. This wasn't very widely known at the time yet.

    So it's no surprise that many organizations were already doing research in the area.

    I think that animals that can fly are:

    1) owls and their relatives

    2) birds such as black-necked owls and the nighting owls

    3) animals with special needs such as pika and pika-tika or pika-mushies.

    4) animals with special needs such as komodo dragons.

    This is fucking gold. Apparently the only existing birds are owls and everything else is special needs. Komodo dragons can now fly and I don't know wtf is a pika-tika or a pika-mushy.

    For example, if you like turtles and want a turtle that has the body of a woman, look at the ones with the body of a woman.

    From the gonewild bot, of course.

    Lmfao special needs komodo 🤣🤣🤣🤣

    "The raccoon." "What?" "The raccoon." That still made me laugh

    "The big black raccoon" - r/the_donald

    Yeah, someone brought back /r/SubredditSimulator a couple days ago, and it's definitely lost it's charm. Before it was funny just when a bot would churn out a coherent post. Now it's like a reflection of everything I hate about Ai.

    The wonderful thing about the subreddit simulator subs is that because they are trained in Reddit comments, that AI is a reflection of US.

    That sub was amazing 🤣

    I love names of those bots.

    Each of them simulated exactly one subreddit, they were trained on the things that had been said on these subreddits. Some subreddits have very peculiar writing styles and this sometimes shows...

    Yeah I noticed that with historian one a lot. That made me chuckle.

    Also, there's apparently enough automated moderation on /r/wallstreetbets that the bot just randomly adds the "this action was performed automatically" line to the end of all of its comments.

    “Well shit that’s a lot of drugs” 😆

    I remember thinking that sub was so cool back in 2020 or so... Then I started to realize that I couldn't tell the difference between posts on that sub and posts on other subs so I had to stop visiting it.

    Damn you brought me down memory lane with that sub

    Seriously, I totally forgot about it. It used to be on the front page of Reddit all the time.

    This stuff is much older than people realize. I remember back in 2010 I met someone who was starting to work on his PhD and writing a paper on Machine Learning and he explained some of the stuff to me and its wild seeing it come to fruition.

  • Imagine Google, Adobe, Apple, Microsoft, Meta, and X all sitting at a poker table with various hands. They each say “check” when it’s their turn to bet… except this new kid sitting at the table named OpenAI who annoyingly goes all in. Then everyone was forced to either go all in with the cards they had, even with shit hands, or fold.

    Except Apple was clearly bluffing

    Apple and Amazon will buy out whoever is left. Especially Amazon.

    They're the two tech companies that don't fundamentally believe themselves to be tech companies. Amazon is a logistics company, Apple is a product design company. Yes they are both tech leaders in some ways but mainly to facilitate their primary purpose.

    Amazon is a front for aws. AWS makes up more than half of the profits of the amazon. At this point it makes more sense to call the company AWS.

    AWS generates more profit than retail, Amazon is very much a tech company

    And AWS makes cash hand over fist just running (and training) LLMs for others.

    And why did Amazon build out AWS?

    They needed very large server capacity for black Friday deals, so they wanted to buy a lot of servers. However, that meant their servers would be idle during other times. So, they decided to let other people use these servers for money and decided to expand their services.

    Amazon basically owns Anthropic

    Here's what I found 

    You’ll need to unlock your iPhone first

    This is pretty accurate. The major players were already working on their own LLMs for years before the ChatGPT public launch. At that point most were still ~5-7 years away from rolling them out as an actual, refined product. But once OpenAI suddenly started getting billions of dollars worth of capital pouring in, they had no choice.

    That’s why a lot of AI functionality is underwhelming for most users rn. We’re still not even to the point where most of the major companies expected it to be publicly available.

    I like the analogy I just find it funny the sub is explain like I'm five and you use a poker analogy lol

    Oh sorry I thought it was Explain Like I’m Five Card Draw playing.

    This is a very forgivable mistake to make

    Except that Open AI's hand was clearly visible to everyone. Loss of traffic and revenue were the accelerators.

    And Microsoft knew the play. They were the early investor in OpenAI in 2019 and currently own > 25%.

  • Think about how you're sitting in kindergarten or school drawing a picture and all your friends are drawing too. You've been drawing for a long while but still aren't happy with it. Then suddenly one of your friends stops and shows their drawing around. Now, will you keep sitting and finish your own drawing until you're happy with it or will you and everyone else show around their own kinda-finished work? That's exactly what happened.

    Scrolled down way too far to find an actual eli5

    The only answer that actually works for a 5 year old - bravo

  • the origin of all these AIs, specifically LLMs, is the 2017 paper Attention is All You Need: https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

    it took a while for the technique to be refined - openai had GPT AIs as early as 2018 but it took until 2022 for GPT-3 to be reliable enough to become viral. At that point other tech companies saw the writing on the wall and started dumping money into their own transformer-based AIs.

    And this spawned the unholy idea of other papers titled x is all you need. One of my favorites in terms of quality and science is Hopfield Networks is All You Need!

    Don't forget Kill is All You Need.

    Or was it All You Need is Kill?

    It’s worth noting that a lot of libraries (mostly Python) to make building these easier also exploded with ChatGPT’s release. Within months there were quite advanced tools and it’s only gotten bigger. At this point, anyone with a pile of text, a few hundred bucks of compute time, and a basic command of the Python language can make a minimal LLM that creates more or less intelligible replies from scratch. If you build on existing ones or spend more to provides more text (Wikipedia can be torrented) you can go further and create a pretty decent one which answers questions based on some specialized domain.

    Given the perceived value of these things, the benefit for the cost is thought to be astronomical, so everyone and their brother are working on one, thus the explosion.

  • The other companies have also been working on their own models for many years. They did not create them overnight. They have been using all of the data and content everyone has been storing on the internet for 25+ years, and all of the research and work computer scientists and neuroscientists have been doing for well over 50 years. And that's just LLMs. Look at all of the other kinds of ML and AI systems in use, from robotics to medical research to engineering. They did not just "copy ChatGPT."

    Check out the "overnight success fallacy" and remember that every overnight success took years or decades to develop.

    I studied back in the early/mid 90s machine translation - automating human language translation - and started to see the first "statistical" translation systems, which back then had surprisingly good accuracy rates. These, with a good enough corpus of documents would regularly achieve 70-80% accuracy.

    So, a very long legacy, probably 40+ years.

    This also doesn't take into account the developments in statistic algorithms, compiler and chip design, Semantic Web and a myriad of other technologies.

    To be fair I think for the non technical people most of these companies did "copy" OpenAI. There are more companies that are just wrappers for ChatGPT than genuine individual AI companies.

    That’s not what the post is about. It’s about the actual model owners not wrappers.

  • AI research is partially done publicly. Researchers publish their advances in papers and public repositories. Those ideas can somewhat quickly be used by everyone.

    When ChatGPT came out companies where pressed to quickly release their own product, but it came not 100% surprising so they all worked on it before already.

  • Google had a working LLM way before, and better than, ChatGPT. The thing is, when ChatGPT first came out, people were impressed and amazed, yes... but then immediately figured out they could get it to explain how to make explosives. Or porn. Or it would lie to them. All the problems we're still dealing with.

    ChatGPT had the benefit of being a relatively unknown company. So they could take the reputation hit of "wow this thing is kinda crazy" because it came with a side of "oh these people are onto something big".

    If Google had done that, the news would've been leaning a lot harder on "this thing is messed up, what the hell is Google thinking releasing this without guiderails."

    So Google let ChatGPT be the first ones out the gate so they could take the hit while they worked on guiderails.

    That sounds a bit like whitewashing for Google’s actual concern, that LLMs could cannibalize their search revenue. And sure enough clicks onto sponsored searches are way way down—click through rate on paid searches is down by 58%, and organic click through rates are down 68% post-AI summaries / searches.

    Google was not meaningfully motivated by compliance concerns.

    This. Google scientists wrote the paper but the search division prevented it from being developed further because ai backed search would cut alot of revenue from ads and promoted results.

    This happens so much with google. They allow people to run with ideas, then shelve it. They might look back at it later but often its just killed.

    Most genuinely useful features of the Internet were usually someone's hobby project that got purchased and monetised!

    Yet if they pressed on they would have had the potential to create something that could still include those results.

    I'd say that Google didn't really 'see' yet how they could incorporate such LLMs in a useable tool for consumers. OpenAI basically saying 'here's a chatbot, have at it' has opened the door for communicating with LLMs as we do today. While it feels like a no-brainer to just add a chat-tool, in the early days a lot of discussion and thought has been going round in how to incorporate LLMs into the tools that were already being used.

    quite interesting when you look at how many things in the Google graveyard were simply just ahead of its time.

    That is simply not true. You shouldnt speculate wildly here and present it as a fact. Google didnt let OpenAI anything. Googles response, Bard, was a a failure and first version of the rebranding (Gemini) was worse than early GPT 3.5.

    Google has researched the subject more than any other comparable tech giant, but they didnt have a better or comparable LLM at that time.

    I am not speculating, I worked there at the time. Meena got lobotomized to have the guardrails necessary for a google product launch (and to scale). The very first ChatGPT launch had me goinf "We've got better than this... but there's no way we would release this". OpenAI did iterate very quickly past that because they had the benefit of user experiences to go off of.

    "Let" as in "this is what we ended up allowing to happen with our hesitation" not as in "sure you first". Google was caught off guard, yes. But even if they hadn't been I think the choice would have been the same.

  • The tech behind ChatGPT in 2022 was based on a paper Google published in 2017. Google and Meta (who have both long been involved in AI research) had already been working on their own AIs based on that technology for years. They just hadn't released it as a chat bot for public use for whatever reason- maybe they didn't think it would be useful, or were worried about it turning racist and damaging their reputation when let loose on the public. When ChatGPT showed that there was interest in such a thing, they just needed to tidy up the AIs they had already built.

    Microsoft on the other hand doesn't have a model they built fully in house. Copilot is a modified version of ChatGPT.

    Google didn’t release their LLM because they feared it would harm their monopoly in Searches. And in fact, post-AI searches and AI summaries, the click through rate on paid ads is down 58% compared to a few years ago.

    It's worth noting their Search revenue hasn't suffered and has in fact increased YoY, despite a very rocky and delayed start they've managed to avoid the 'innovator's dilemma'

    Microsoft on the other hand doesn't have a model they built fully in house. Copilot is a modified version of ChatGPT.

    They don't have a model that is on the same level, but they were doing research just as long as anyone.

    https://en.wikipedia.org/wiki/Tay_(chatbot)

  • Something I’ll add as someone who’s not in the AI scene but is in the tech scene… you gotta remember that while those of us outside the industry might have zero clue what’s going on, those inside aren’t exactly working on the Manhattan project so to speak. A lot of these people in these companies cross pollinate in other similar companies and they all talk. One company may not know specifically how their rival is doing something but they know they’re doing it because many of their employees used to work for that rival company and vice verse.

    Also consider there were plenty of signs things were headed this way. We didn’t have LLM chatbots widely available to the public but there was plenty of AI-lite. Facebook rolled out a feature 10 years ago that would scan photos your friends uploaded that you’re in and automatically tag you based on facial recognition. Google has been using those “select all the squares containing bicycles” tests for years, that’s just AI training. I read an article the other day about people doing gig work doing random and odd tasks in front of cameras and mics back in 2016 that they only realized in 2023 was training AI models. 

    and people forget about DAL-E, too. That was like black magic at the time, but somehow the public didn’t pay much attention!

    Because it wasn't made public. It was in beta testing and only selected people got limited access as there were no guardrails.

    OpenAI was under pressure to release and they did. A lot of concerns were raised as technology can be used for nefarious purpose, but then they were like fuck it.

  • People often say “Google invented transformers”, but that skips a huge step. A research paper is like an idea, turning it into a working, scalable product that doesn’t fall over is the hard part (proof is how shit bard was 1 year after ChatGPT).

    Only a small handful of companies actually own frontier models in the US anyway: OpenAI, Google, Anthropic, Meta, and xAI (Grok). Microsoft doesn’t have its own model, it uses OpenAI’s because it invested heavily in them.

    To answer your question specifically,

    1. Proof removes risk

    Before ChatGPT, it wasn’t obvious that spending billions on training giant language models would pay off. Once OpenAI proved: - people wanted it - it could be monetised - it could work at scale

    other companies suddenly had the confidence to go all in. It’s much easier to jump when someone else has already shown the bridge holds.

    1. Talent (AI researchers and developers)

    The other thing was know-how: - how to train at massive scale - how to make models stable - how to do RLHF, safety, deployment, and iteration

    That knowledge lives in people’s heads.

    Those people move between companies. Anthropic is the clearest example: it was founded almost entirely by ex-OpenAI staff. They didn’t copy code, but they absolutely reused their experience of what works and what doesn’t.

    This kind of talent migration is normal in tech, but it’s quietly ignored unless it involves China, then it suddenly gets called “espionage”.

    TLDR:

    It wasn’t that everyone magically caught up overnight. - OpenAI proved the path was viable - Talent who had already done the work spread out - A few very rich companies followed quickly

  • These companies already had LLMs for years. OpenAI had GPT for years. Then OpenAI had the clever idea of turning GPT into a chatbot by fine-tuning GPT for chatbot conversations. Fine-tuning is taking a model trained generally and training it for a specific purpose.

    So the other companies already had all of the heavy work done, they just didn't know how to use it. Once OpenAI showed a way to use it, they all copied that.

  • Recently listened to an interesting podcast called The Last Invention, telling the story of how AI became what it is today.