搜索 Will_There_Be_a_Next_Time 相关的推文与用户

Will_There_Be_a_Next_Time 贴吧

一个关键词就是一个贴吧，路径全站唯一。

创建贴吧

用户

未找到

包含 Will_There_Be_a_Next_Time 的内容

LeeHyun official@LeeHyun_bighit

2021.07.30 11:00

이현 ‘다음이 있을까' [코멘트] LeeHyun 'Will There Be a Next Time’ [Comment] 🎤

1.4K

155

转发到社区

LeeHyun official@LeeHyun_bighit

2021.07.30 11:00

여름 밤을 촉촉하게 적혀줄 이현의 감성 발라드 '다음이 있을까' 라이브 무대 😭 지금 보러오시죠! 🎤 🎵 #이현# #LeeHyun# #다음이_있을까# #Will_There_Be_a_Next_Time#

显示更多

139

3.8K

474

转发到社区

Andrej Karpathy@karpathy

2026.02.19 20:35

Very interested in what the coming era of highly bespoke software might look like. Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over experiment duration of 8 weeks. The primary way to do this is to aspire to a certain sum total minute goals in Zone 2 cardio and 1 HIIT/week. 1 hour later I vibe coded this super custom dashboard for this very specific experiment that shows me how I'm tracking. Claude had to reverse engineer the Woodway treadmill cloud API to pull raw data, process, filter, debug it and create a web UI frontend to track the experiment. It wasn't a fully smooth experience and I had to notice and ask to fix bugs e.g. it screwed up metric vs. imperial system units and it screwed up on the calendar matching up days to dates etc. But I still feel like the overall direction is clear: 1) There will never be (and shouldn't be) a specific app on the app store for this kind of thing. I shouldn't have to look for, download and use some kind of a "Cardio experiment tracker", when this thing is ~300 lines of code that an LLM agent will give you in seconds. The idea of an "app store" of a long tail of discrete set of apps you choose from feels somehow wrong and outdated when LLM agents can improvise the app on the spot and just for you. 2) Second, the industry has to reconfigure into a set of services of sensors and actuators with agent native ergonomics. My Woodway treadmill is a sensor - it turns physical state into digital knowledge. It shouldn't maintain some human-readable frontend and my LLM agent shouldn't have to reverse engineer it, it should be an API/CLI easily usable by my agent. I'm a little bit disappointed (and my timelines are correspondingly slower) with how slowly this progression is happening in the industry overall. 99% of products/services still don't have an AI-native CLI yet. 99% of products/services maintain .html/.css docs like I won't immediately look for how to copy paste the whole thing to my agent to get something done. They give you a list of instructions on a webpage to open this or that url and click here or there to do a thing. In 2026. What am I a computer? You do it. Or have my agent do it. So anyway today I am impressed that this random thing took 1 hour (it would have been ~10 hours 2 years ago). But what excites me more is thinking through how this really should have been 1 minute tops. What has to be in place so that it would be 1 minute? So that I could simply say "Hi can you help me track my cardio over the next 8 weeks", and after a very brief Q&A the app would be up. The AI would already have a lot personal context, it would gather the extra needed data, it would reference and search related skill libraries, and maintain all my little apps/automations. TLDR the "app store" of a set of discrete apps that you choose from is an increasingly outdated concept all by itself. The future are services of AI-native sensors & actuators orchestrated via LLM glue into highly custom, ephemeral apps. It's just not here yet.

显示更多

913

12K

转发到社区

Andrej Karpathy@karpathy

2025.12.07 18:13

Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask: "What do you think about xyz"? There is no "you". Next time try: "What would be a good group of people to explore xyz? What would they say?" The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".

显示更多

1.1K

27.7K

2.8K

转发到社区

vitalik.eth@VitalikButerin

2026.05.24 16:19

Some of my perspective on where the @ethereumfndn is going. First of all, this is only my own view. The board is not just me, and I have no extra special powers on the board that the other board members do not. @aerugoettinea is the one executing much of this transition. My input has been largely on technical questions. The board is in the process of expanding, and my own power within the org will continue to decrease, which is honestly what I want. The 2025 era brought many important improvements to EF and its ability to execute. Many issues were resolved, and EF continues to benefit from its improved efficiency and greater focus on concrete goals to this day. And so with those problems resolved, early this year, the largest remaining hole that I perceived was something different nagging at me: I would regularly spot people saying things like "vitalik says these beautiful things about ethereum needing to be decentralized, and have privacy, and be a sanctuary technology, but why do the EF's actions not reflect that?" Now, you may have been hearing something different. You may not have been sensing a feeling of crisis at all, and maybe were hearing people saying that finally we were taking execution and BD seriously and the main task for us is to keep going that way and be even better and faster. Then probably there is genuine difference between you and me, in what kinds of criticism I take most seriously, and what kinds of critics through their criticism are most able to make me feel pain. As an analogy, let's briefly switch over to a different domain. One belief you can have about Google is that it is a success story, and has brought a lot of good to humanity in organizing the world's information. Another belief you can have about Google is that they had a beautiful idealistic beginning, but at some point the corruption of mainstream corporate attitudes seeped in, and they slowly bit by bit completely abandoned the "don't be evil" slogan. My belief on Google specifically is probably somewhere between the two. BUT, if you had taken me back in time to ~2008, and offered me a button to press to make Google one or two standard deviations more "dogmatic", eg. give Richard Stallman permanent veto power over some key policies, I would immediately press it. Why? Because a choice for one company is not a choice for the world, or even one country. Google existed and exists in the context of a technology industry generally drifting away from early idealistic don't-be-evil roots and toward greed for financial gain, totalizing visions of accelerated superintelligence, infiltration by sociopaths, and craven capitulation to (or worse, active participation in) government pressure for ideological control, surveillance and war. And so *one company* doing something different, positioning itself to be what George Bernard Shaw calls the Unreasonable Man, resisting the trend of the times, would have been better for freedom, balance of power and stability of society as a whole, than *all* large companies bending to dominant trends. This is a part of my version of pluralism. This line of thinking is not just mine, but I also is not too far off from what Aya and others had in mind with the Mandate. Now how does this all get to the role of the EF? EF is not a "center of Ethereum", rather EF is "one node, with a defined purpose, alongside other nodes". We've always said that the EF should be the latter, but many in the Ethereum ecosystem (and even within the EF) wanted us to be the former. Now, we are taking action to ensure that we will be the latter. This is particularly important because EF is a limited organization, with limited resources and limited organizational capacity. The EF has only ~0.16% of all ETH (less than many other individual ETH holders), whereas among other blockchains it's common for "the central foundation" to have 10-50%. Fiscally, the EF was originally designed to fulfill a limited work scope defined in the token sale docs and other pre-launch materials (building the chain software; getting through Frontier, Homestead, Metropolis, Serenity), which was fully completed in 2022; it was not designed to be an eternal steward. And so today, the EF is choosing to use its remaining resources to pursue longevity over breadth (yes, this means we sell less ETH). The EF focuses *specifically* on those activities critical to the success of ethereum as a censorship/capture-resistant, open, private and secure system, that would not happen otherwise. This means making hard choices, and in some cases even activities that we highly approve of and people that we highly respect becoming outside of the EF. People of great technical talent, public respect and even alignment with the mission and CROPS being outside of the EF is in fact necessary if we want important tasks to be able to attract outside capital. This also means the EF taking opinionated stands culturally. This is all intended in cooperation with all other parts of ethereum. We recognize that many other parts of the ethereum world highly respect CROPS and related values. But highly respecting is not the same as choosing to specialize and totally dedicate to a domain (Compare in a different domain: I think reducing animal cruelty is important, and I like vegan food, but am not full unconditional vegan myself) EF is still in a transition period, and we expect its new long-term form to stabilize over the next few months. What are the guiding principles of this new form? Again, I am only one person, but I can give my answer from a technical perspective (there are also critical non-technical aspects). At the core, *Ethereum must be impressive*. We are living in an age of highly intelligent AI and all kinds of other technological acceleration. "Status quo EVM, with a hard fork or two a year to optimize for short-term needs of users" is not interesting. To some, "impressive" means: 250ms latency and 1M TPS. I think Ethereum trying to go that route is a mistake. Being as fast and as scalable as possible, and only a small epsilon more decentralized than the others, is a route to mediocrity, and if we try it we will lose. I think Ethereum should scale. But I think Ethereum should strive the hardest to be deeply impressive in a different dimension: the CROPS dimension. This means things like: * Provably bug-free Ethereum. This is a goal that all cybersecurity researchers would have thought is absurd and impossible, up until roughly 6 months ago. Now, it's on the cusp of being possible, thanks to AI-assisted formal verification. So we should be frontrunners in doing this. * Available chain consensus. Ethereum is, and with lean consensus will cotninue to be, the ONLY chain that has both (i) traditional-BFT style properties that it's safe under asynchrony up to a high level of fault tolerance, and (ii) the bitcoin PoW-style property that under synchrony it's safe up to 49% attackers. As far as I can tell, literally no other chain has this or is planning for it; bitcoin goes for (ii) only and most other chains go for (i) only. Some will remember I fought hard for this, Unreasonably insisting that it is not OK for ethereum to rely on social consensus and hard forks to rescue ethereum from 34% of nodes going offline. It's OK for chains like hyperledger, bnb, solana, tempo, etc. It's not OK for bitcoin or ethereum or eg. zcash. * Intermediary minimization. The fact that smart contract wallets, protocols like railgun, etc have to send transactions through intermediaries to get included onchain is honestly embarrassing, and it's a constant point of fragility. Hence the work on FOCIL and EIP-8141 (and 7701 and years of work before) to make transaction sending intermediary-minimized with public mempool and strong inclusion properties, in a truly general-purpose way, that covers not just eg. secp256r1, but also privacy protocols and much more. Kohaku is pushing intermediary minimization at the user layer, pulling Ethereum away from the dystopian status quo world where our wallets don't even verify the chain, send our private data out to a dozen third-party servers, and toward a brighter CROPS future. Some of these goals are Unreasonable - maybe Ethereum would be "fine" getting only 50% of the way - what if we depend on intermediaries, but make it easy to switch? But going 50% of the way would not make Ethereum Deeply Impressive in the CROPS way. So we push for 100%. Fortunately all these goals are compatible with high TPS, this is a major focus of research (esp. on scaling the state). Well-designed L2s can also help, especially L2s optimized for specific applications (eg. high-volume trading, privacy...). These goals are even compatible with significantly lower slot times, thanks to Raul's work on erasure-coded P2P, and many other optimizations. The most high-value "product" of the ethereum blockchain, financially speaking, is ETH the asset. Ethereum secures $250 billion of ETH. The types of properties of Ethereum that I mentioned above are very good for ETH the asset. Nearly 90% of my net worth is in ETH, and most of the remainder is ~$40m of onchain fiat of which every dollar has already been allocated for some open-source biotech or software or hardware initiative. That said, there are aspects of supporting ETH the asset - *necessary* aspects even - that are outside the scope of the EF. This is where we need other heroes (some of whom hold more ETH than the EF does) to step in and help. EF has been recently thinking more about how it will relate to other such organizations, and give them needed initial support. EF will be a smaller ship than in previous years, a more opinionated one - in some cases more opinionated in ways that might be difficult to comprehend - but a longer-lasting one, and one suited to making sure that ethereum brings something meaningful to the world. We are grateful to all those inside and outside the EF who are helping to make this happen.

显示更多

1.5K

7.3K

1.3K

转发到社区

Brian Armstrong@brian_armstrong

2026.05.05 10:55

This is an email I sent earlier today to all employees at Coinbase: Team, Today I’ve made the difficult decision to reduce the size of Coinbase by ~14%. I want to walk you through why we're doing this now, what it means for those affected, and how this positions us for the future. Why now Two forces are converging at the same time. We need to be front footed to respond to both. First, the market. Coinbase is well-capitalized, has diversified revenue streams, and is well-positioned to weather any storm. Crypto is also on the verge of the next wave of adoption, with stablecoins, prediction markets, tokenization, and more taking off. However, our business is still volatile from quarter to quarter. While we've managed through that cyclicality many times before and come out stronger on the other side, we’re currently in a down market and need to adjust our cost structure now so that we emerge from this period leaner, faster, and more efficient for our next phase of growth. Second, AI is changing how we work. Over the past year, I’ve watched engineers use AI to ship in days what used to take a team weeks. Non-technical teams are now shipping production code and many of our workflows are being automated. The pace of what's possible with a small, focused team has changed dramatically, and it's accelerating every day. All of this has led us to an inflection point, not just for Coinbase, but for every company. The biggest risk now is not taking action. We are adjusting early and deliberately to rebuild Coinbase to be lean, fast, and AI-native. We need to return to the speed and focus of our startup founding, with AI at our core. What this means To get there, we are not just reducing headcount and cutting costs, we’re fundamentally changing how we operate: rebuilding Coinbase as an intelligence, with humans around the edge aligning it. What does this mean in practice? - Fewer layers, faster decisions: We are flattening our org structure to 5 layers max below CEO/COO. Layers slow things down and create coordination tax. The future is small, high context teams that can move quickly. Leaders will own much more, with as many as 15+ direct reports. Fewer layers also means a leaner cost structure that is built to perform through all market cycles. - No pure managers: Every leader at Coinbase must also be a strong and active individual contributor. Managers should be like player-coaches, getting their hands dirty alongside their teams. - AI-native pods: We’ll be concentrating around AI-native talent who can manage fleets of agents to drive outsized impact. We’ll also be experimenting with reduced pod sizes, including “one person teams” with engineers, designers, and product managers all in one role. In short: AI is bringing a profound shift in how companies operate, and we’re reshaping Coinbase to lead in this new era. This is a new way of working, and we need to leverage AI across every facet of our jobs. To those who are affected I know there are real people behind these decisions — talented colleagues who have poured themselves into this company and our mission. To those of you who will be leaving: thank you. You’ve helped build Coinbase into what it is today, and I am sincerely grateful for everything you've done. All impacted team members will receive an email to their personal account in the next hour with more information, and an invitation to meet with an HRBP and a senior leader in your organization. Coinbase system access has been removed today. I know this feels sudden and harsh, but it is the only responsible choice given our duty to protect customer information. To those affected, we will be providing a comprehensive package to support you through this transition. US employees will receive a minimum of 16 weeks base pay (plus 2 weeks per year worked), their next equity vest, and 6 months of COBRA. Employees on a work visa will get extra transition support. Those outside of the US will receive similar support, based on local factors and subject to any consultation requirements. Coinbase prides itself on talent density. Our employees are among the most talented people in the world, and I have no doubt that your skills and experience will be highly sought after as you pursue your next chapters. How we move forward To the team that is staying, I know this is a difficult day. We’re saying goodbye to colleagues and friends you've been in the trenches with. But here’s what I want you to know as we move forward together: Over the past 13 years, we have weathered four crypto winters, gone public, and built the most trusted platform in our industry. We’ve made it this far by making hard decisions and by always staying focused on our mission. This time will be no different – nothing has changed about the long term outlook of our company or industry. And most importantly, our mission has never been more important for the world. Increasing economic freedom requires a new financial system, and we’re building it. The Coinbase that emerges from this will be more capable than ever to achieve our mission. Brian

显示更多

5.3K

20.1K

2.4K

转发到社区

Bill Ackman@BillAckman

2026.04.04 21:50

I am reaching out to the @X community for advice with the likely risk of sharing TMI. I have been sufficiently upset about the whole matter that I have lost sleep thinking about it and I am hoping that this post will enable me to get this matter off my chest. By way of background, I started a family office called TABLE about 15 years ago and hired a friend who had previously managed a family office, and years earlier, had been my personal accountant. She is someone that I trusted implicitly and consider to be a good person. The office started small, but over the last decade, the number of personnel and the cost of the office grew massively. The growth was entirely on the operational side as the investment team has remained tiny. While my investment portfolio grew substantially, the investments I had made were almost entirely passive and TABLE simply needed to account for them and meet capital calls as they came in. While TABLE purchased additional software and other systems that were supposed to improve productivity, the team kept increasing in size at a rapid rate, and the expenses continued to grow even faster. While I would periodically question the growing expenses and high staff turnover, I stayed uninvolved with the office other than a once-a-year meeting when I briefly reviewed the operations and the financials and determined bonus compensation for the President and the CFO. I spent no time with any of the other employees or the operations. The whole idea behind TABLE was that it would handle everything other than my day job so that I would have more time for my job and my family. Over the last six years, expenses ballooned even further, employee turnover accelerated, and I became concerned that all was not well at TABLE. It was time for me to take a look at what was going on. Nearly four years ago, I recruited my nephew who had recently graduated from Harvard and put him to work at Bremont, a British watchmaker, one of my only active personal investments to figure out the issues at the company and ultimately assist in executing a turnaround. He did a superb job. When he returned from the UK late last year after a few years at Bremont, I asked him to help me figure out what was going on with TABLE. When I explained to TABLE’s president what he would be doing, she became incredibly defensive, which naturally made me more concerned. My nephew went to work by first meeting with each employee to understand their roles at the company and to learn from them what ideas they had on how things could be improved. He got an earful. Our first step in helping to turn around TABLE was a reduction in force including the president and about a third of the team, retaining excellent talent that had been desperate for new leadership. Now here is where I need your advice. All but one of the employees who were terminated acted professionally and were gracious on the way out (excluding the president who had a notice period in her contract, is currently still being paid, and with whom I have not yet had a discussion). The highest compensated terminated employee other than the president, an in-house lawyer (let’s call her Ronda), told us that three months of severance was not enough and demanded two years’ severance despite having worked at the company for only two and one half years. When I learned of Ronda's request for severance, I offered to speak with her to understand what she was thinking, but she refused to do so. A few days ago, we received a threatening letter from a Silicon Valley law firm. In the letter, Ronda’s counsel suggests that her termination is part of longstanding issues of ‘harassment and gender discrimination’ – an interesting claim in light of the fact that Ronda was in charge of workplace compliance – and that her termination was due to: “unlawful, retaliatory, and harmful conduct directed towards her. Both [Ronda] and I [Ronda’s lawyer] have spoken with you about [Ronda’s] view of what a reasonable resolution would include given the circumstances. Thus far, TABLE has refused to provide any substantive response. This letter provides the last opportunity to reach a satisfactory agreement. If we cannot do so, [Ronda] will seek all appropriate relief in a court of competent jurisdiction.” The letter goes on to explain the basis for the “unsafe work environment” claim at TABLE: “In early 2026, Pershing Square’s founder Bill Ackman installed his nephew in an unidentified role at TABLE, Ackman’s family office. [His nephew]—whose only work experience had been for TABLE where he was seconded abroad for the last four years to a UK watch company held by Ackman—began appearing at TABLE’s offices and conducting interviews of employees without a clear explanation of his role or the purposes of these interviews. During this period, he made a series of inappropriate and genderbased [sic] comments to multiple employees that created an unsafe work environment. Among other things, [his nephew] made remarks about female employees’ ages (“Tell me you are nowhere near 40”), physical appearance (“Your body does not look like you have kids”), as well as intrusive questions about family planning and sexual orientation (“Who carried your son? Who will carry your next child?”). These incidents were reported to senior leadership at TABLE and Pershing Square. Rather than being addressed appropriately, the response from senior management reflected, at best, willful blindness to the inappropriateness of [his nephew]’s remarks and, at worst, tacit endorsement.” The above allegations about my nephew had previously been brought to my attention by TABLE’s president when they occurred. When I learned of them, I told the president that I would speak to him directly and encouraged her to arrange for him to get workplace sensitivity training. The president assured me that she would do so. When I spoke to my nephew, he explained what he actually had said and how his actual remarks had been received, not at all as alleged in the legal letter from Ronda’s counsel. I have also spoken to others at the lunch table who confirmed his description of the facts. In any case, he meant no harm, was simply trying to build rapport with other employees, and no one, as far as I understand, was offended. Ironically, Ronda claims in her legal letter that TABLE didn’t take HR compliance seriously, yet Ronda was in charge of HR compliance at TABLE and the person who gave my nephew his workplace sensitivity training after the alleged incidents. In any case, Ronda, as head of compliance, should have kept a record or raised an alarm if indeed there was pervasive harassment or other such problems at the company, and there is no evidence whatsoever that this is true. So why does Ronda believe she can get me to pay her nearly $2 million, i.e., two years of severance, nearly one year of severance for each of her years at the company? Well, here is where some more background would be helpful. Over the last two months, I have been consumed with a major family medical issue – one of my older daughters had a massive brain hemorrhage on February 5th and has since been making progress on her recovery – and I am in the midst of a major transaction for my company which I am executing from a hospital room office next to her . While the latter business matter is publicly known, the details of my daughter’s situation are only known to Ronda because of her role at our family office. Now, let’s get back to the subject at hand. Unfortunately, while New York and many other states have employment-at-will, there has emerged an industry of lawyers who make a living from bringing fake gender, race, LGBTQ and other discrimination employment claims in order to extract larger severance payments for terminated employees, and it needs to stop. The fake claim system succeeds because it costs little to have a lawyer send a threatening letter and nearly all of the lawyers in this field work on contingency so there is no or minimal cash cost to bring a claim. And inevitably, nearly 100% of these claims are settled because the public relations and legal costs of defending them exceed the dollar cost of the settlement. The claims are nearly always settled with a confidentiality agreement where the employee who asserts the fake claims remains anonymous and as a result, there is no reputational cost to bringing false claims. The consequences of this sleazy system (let’s call it ‘the System’) are the increased costs of doing business which is a tax on the economy and society. There are other more serious problems due to the System. Unfortunately, the existence of an industry of plaintiff firms and terminated employees willing to make these claims makes it riskier for companies to hire employees from a protected class, i.e., LGBTQ, seniors, women, people of color etc. because it is that much more reputationally damaging and expensive to be accused of racism, sexism, and/or intolerance for sexual diversity than for firing a white male as juries generally have less sympathy for white males. The System therefore increases the risk of discrimination rather than reducing it, and the people bringing these fake claims are thereby causing enormous harm to the other members of these protected classes. So what happened here? Ronda was vastly overpaid and overqualified for the job that she did at TABLE. She was paid $1.05 million plus benefits last year for her work which was largely comprised of filling out subscription agreements and overseeing an outside law firm on closing passive investments in funds and in private and venture stage companies, some compliance work, and managing the office move from one office to another. She had a very good gig as she was highly paid, only had to go into the office three days a week, and could work from anywhere during the summer. Once my nephew showed up and started to investigate what was going on, she likely concluded that there was a reasonable possibility she would be terminated, as her job was in the too-easy-and-to-good-to-be-true category. The problem was that she was not in a protected class due to her race, age or sexual identity so she had to construct the basis for a claim. While she is female and could in theory bring a gender-based discrimination claim, she reported to the president who is female and to whom she is very close, which makes it difficult for her to bring a harassment claim against her former boss. When my nephew complimented a TABLE employee at lunch about how young she looked – in response to saying she was going to her 40-year-old sister’s birthday party, he said ‘she must be your older sister’ – Ronda immediately reported it to our external HR lawyer. She thereby began building her case. The other problem for Ronda bringing a claim is that she was terminated alongside 30% of other TABLE employees as part of a restructuring so it is very difficult for her to say that she was targeted in her termination or was retaliated against. TABLE is now hiring an external fractional general counsel as that is all the company needs to process the relatively limited amount of legal work we do internally. In short, Ronda was eminently qualified and capable and did her job. She was just too much horsepower for what is largely an administrative legal role so she had to come up with something else to bring a claim. Now Ronda knew I was a good target and it was a good time to bring a claim against me. She also knew that I was under a lot of pressure because on March 4th when Ronda was terminated, my daughter had not yet emerged from consciousness, she was not yet breathing on her own, and my daughter and we were fighting for her life. I was and remain deeply engaged in her recovery while at the same time I was working on finishing the closing for the private placement round for my upcoming IPO. Ronda also knew that publicity about supposed gender discrimination and a “hostile and unsafe work environment” are not things that a CEO of a company about to go public wants to have released into the media. And she may have thought that the nearly $2 million she was asking for would be considered small in the context of the reputational damage a lawsuit could cause, regardless of the fact that two years of severance was an absurd amount for an employee who had only worked at TABLE for 30 months. She also likely considered that I wouldn’t want to embarrass my nephew by dragging him into the klieg lights when her claims emerged publicly. So, in summary, game theory would say that I would certainly settle this case, for why would I risk negative publicity at a time when I was preparing our company to go public and also risk embarrassing my nephew. Notably, she hired a Silicon Valley law firm, rather than a typical NY employment firm. This struck me as interesting as her husband works for one of the most prominent Silicon Valley venture firms whose CEO, I am sure, has no tolerance for these kinds of fake claims that sadly many venture-backed companies also have to deal with. I mention this as I suspect her husband likely has been working with her on the strategy for squeezing me as, in addition to being a computer scientist, he is a game theorist. My only advice for him is to understand more about your opponent before you launch your first move. All of the above said, gender, race, LGBTQ and other such discrimination is a real thing. Many people have been harmed and deserve compensation for this discrimination, and these companies and individuals should be punished for engaging in such behavior. Which brings me to the advice I am seeking from the X community. I am not planning to follow the typical path and settle this ‘claim.’ Rather, I am going to fight this nonsense to the end of the earth in the hope that it inspires other CEOs to do the same so we shut down this despicable behavior that is a large tax on society, employment, and the economy and contributes to workplace discrimination rather than reducing it. Do you agree or disagree that this is the right approach?

显示更多

10.8K

24K

1.4K

转发到社区

Andrej Karpathy@karpathy

2025.02.18 05:25

I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check. Thinking ✅ First, Grok 3 clearly has an around state of the art thinking model ("Think" button) and did great out of the box on my Settler's of Catan question: "Create a board game webpage showing a hex grid, just like in the game Settlers of Catan. Each hex grid is numbered from 1..N, where N is the total number of hex tiles. Make it generic, so one can change the number of "rings" using a slider. For example in Catan the radius is 3 hexes. Single html page please." Few models get this right reliably. The top OpenAI thinking models (e.g. o1-pro, at $200/month) get it too, but all of DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude do not. ❌ It did not solve my "Emoji mystery" question where I give a smiling face with an attached message hidden inside Unicode variation selectors, even when I give a strong hint on how to decode it in the form of Rust code. The most progress I've seen is from DeepSeek-R1 which once partially decoded the message. ❓ It solved a few tic tac toe boards I gave it with a pretty nice/clean chain of thought (many SOTA models often fail these!). So I upped the difficulty and asked it to generate 3 "tricky" tic tac toe boards, which it failed on (generating nonsense boards / text), but then so did o1 pro. ✅ I uploaded GPT-2 paper. I asked a bunch of simple lookup questions, all worked great. Then asked to estimate the number of training flops it took to train GPT-2, with no searching. This is tricky because the number of tokens is not spelled out so it has to be partially estimated and partially calculated, stressing all of lookup, knowledge, and math. One example is 40GB of text ~= 40B characters ~= 40B bytes (assume ASCII) ~= 10B tokens (assume ~4 bytes/tok), at ~10 epochs ~= 100B token training run, at 1.5B params and with 2+4=6 flops/param/token, this is 100e9 X 1.5e9 X 6 ~= 1e21 FLOPs. Both Grok 3 and 4o fail this task, but Grok 3 with Thinking solves it great, while o1 pro (GPT thinking model) fails. I like that the model *will* attempt to solve the Riemann hypothesis when asked to, similar to DeepSeek-R1 but unlike many other models that give up instantly (o1-pro, Claude, Gemini 2.0 Flash Thinking) and simply say that it is a great unsolved problem. I had to stop it eventually because I felt a bit bad for it, but it showed courage and who knows, maybe one day... The impression overall I got here is that this is somewhere around o1-pro capability, and ahead of DeepSeek-R1, though of course we need actual, real evaluations to look at. DeepSearch Very neat offering that seems to combine something along the lines of what OpenAI / Perplexity call "Deep Research", together with thinking. Except instead of "Deep Research" it is "Deep Search" (sigh). Can produce high quality responses to various researchy / lookupy questions you could imagine have answers in article on the internet, e.g. a few I tried, which I stole from my recent search history on Perplexity, along with how it went: - ✅ "What's up with the upcoming Apple Launch? Any rumors?" - ✅ "Why is Palantir stock surging recently?" - ✅ "White Lotus 3 where was it filmed and is it the same team as Seasons 1 and 2?" - ✅ "What toothpaste does Bryan Johnson use?" - ❌ "Singles Inferno Season 4 cast where are they now?" - ❌ "What speech to text program has Simon Willison mentioned he's using?" ❌ I did find some sharp edges here. E.g. the model doesn't seem to like to reference X as a source by default, though you can explicitly ask it to. A few times I caught it hallucinating URLs that don't exist. A few times it said factual things that I think are incorrect and it didn't provide a citation for it (it probably doesn't exist). E.g. it told me that "Kim Jeong-su is still dating Kim Min-seol" of Singles Inferno Season 4, which surely is totally off, right? And when I asked it to create a report on the major LLM labs and their amount of total funding and estimate of employee count, it listed 12 major labs but not itself (xAI). The impression I get of DeepSearch is that it's approximately around Perplexity DeepResearch offering (which is great!), but not yet at the level of OpenAI's recently released "Deep Research", which still feels more thorough and reliable (though still nowhere perfect, e.g. it, too, quite incorrectly excludes xAI as a "major LLM labs" when I tried with it...). Random LLM "gotcha"s I tried a few more fun / random LLM gotcha queries I like to try now and then. Gotchas are queries that specifically on the easy side for humans but on the hard side for LLMs, so I was curious which of them Grok 3 makes progress on. ✅ Grok 3 knows there are 3 "r" in "strawberry", but then it also told me there are only 3 "L" in LOLLAPALOOZA. Turning on Thinking solves this. ✅ Grok 3 told me 9.11 > 9.9. (common with other LLMs too), but again, turning on Thinking solves it. ✅ Few simple puzzles worked ok even without thinking, e.g. *"Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"*. E.g. GPT4o says 2 (incorrectly). ❌ Sadly the model's sense of humor does not appear to be obviously improved. This is a common LLM issue with humor capability and general mode collapse, famously, e.g. 90% of 1,008 outputs asking ChatGPT for joke were repetitions of the same 25 jokes. Even when prompted in more detail away from simple pun territory (e.g. give me a standup), I'm not sure that it is state of the art humor. Example generated joke: "*Why did the chicken join a band? Because it had the drumsticks and wanted to be a cluck-star!*". In quick testing, thinking did not help, possibly it made it a bit worse. ❌ Model still appears to be just a bit too overly sensitive to "complex ethical issues", e.g. generated a 1 page essay basically refusing to answer whether it might be ethically justifiable to misgender someone if it meant saving 1 million people from dying. ❌ Simon Willison's "*Generate an SVG of a pelican riding a bicycle*". It stresses the LLMs ability to lay out many elements on a 2D grid, which is very difficult because the LLMs can't "see" like people do, so it's arranging things in the dark, in text. Marking as fail because these pelicans are qutie good but, but still a bit broken (see image and comparisons). Claude's are best, but imo I suspect they specifically targeted SVG capability during training. Summary. As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats - the models are stochastic and may give slightly different answers each time, and it is very early, so we'll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my "LLM council" and hear what it thinks going forward.

显示更多

666

16.8K

2.2K

转发到社区

Andrej Karpathy@karpathy

2024.08.07 20:08

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely appreciated. RL is powerful. RLHF is not. Let's take a look at the example of AlphaGo. AlphaGo was trained with actual RL. The computer played games of Go and trained on rollouts that maximized the reward function (winning the game), eventually surpassing the best human players at Go. AlphaGo was not trained with RLHF. If it were, it would not have worked nearly as well. What would it look like to train AlphaGo with RLHF? Well first, you'd give human labelers two board states from Go, and ask them which one they like better: Then you'd collect say 100,000 comparisons like this, and you'd train a "Reward Model" (RM) neural network to imitate this human "vibe check" of the board state. You'd train it to agree with the human judgement on average. Once we have a Reward Model vibe check, you run RL with respect to it, learning to play the moves that lead to good vibes. Clearly, this would not have led anywhere too interesting in Go. There are two fundamental, separate reasons for this: 1. The vibes could be misleading - this is not the actual reward (winning the game). This is a crappy proxy objective. But much worse, 2. You'd find that your RL optimization goes off rails as it quickly discovers board states that are adversarial examples to the Reward Model. Remember the RM is a massive neural net with billions of parameters imitating the vibe. There are board states are "out of distribution" to its training data, which are not actually good states, yet by chance they get a very high reward from the RM. For the exact same reasons, sometimes I'm a bit surprised RLHF works for LLMs at all. The RM we train for LLMs is just a vibe check in the exact same way. It gives high scores to the kinds of assistant responses that human raters statistically seem to like. It's not the "actual" objective of correctly solving problems, it's a proxy objective of what looks good to humans. Second, you can't even run RLHF for too long because your model quickly learns to respond in ways that game the reward model. These predictions can look really weird, e.g. you'll see that your LLM Assistant starts to respond with something non-sensical like "The the the the the the" to many prompts. Which looks ridiculous to you but then you look at the RM vibe check and see that for some reason the RM thinks these look excellent. Your LLM found an adversarial example. It's out of domain w.r.t. the RM's training data, in an undefined territory. Yes you can mitigate this by repeatedly adding these specific examples into the training set, but you'll find other adversarial examples next time around. For this reason, you can't even run RLHF for too many steps of optimization. You do a few hundred/thousand steps and then you have to call it because your optimization will start to game the RM. This is not RL like AlphaGo was. And yet, RLHF is a net helpful step of building an LLM Assistant. I think there's a few subtle reasons but my favorite one to point to is that through it, the LLM Assistant benefits from the generator-discriminator gap. That is, for many problem types, it is a significantly easier task for a human labeler to select the best of few candidate answers, instead of writing the ideal answer from scratch. A good example is a prompt like "Generate a poem about paperclips" or something like that. An average human labeler will struggle to write a good poem from scratch as an SFT example, but they could select a good looking poem given a few candidates. So RLHF is a kind of way to benefit from this gap of "easiness" of human supervision. There's a few other reasons, e.g. RLHF is also helpful in mitigating hallucinations because if the RM is a strong enough model to catch the LLM making stuff up during training, it can learn to penalize this with a low reward, teaching the model an aversion to risking factual knowledge when it's not sure. But a satisfying treatment of hallucinations and their mitigations is a whole different post so I digress. All to say that RLHF *is* net useful, but it's not RL. No production-grade *actual* RL on an LLM has so far been convincingly achieved and demonstrated in an open domain, at scale. And intuitively, this is because getting actual rewards (i.e. the equivalent of win the game) is really difficult in the open-ended problem solving tasks. It's all fun and games in a closed, game-like environment like Go where the dynamics are constrained and the reward function is cheap to evaluate and impossible to game. But how do you give an objective reward for summarizing an article? Or answering a slightly ambiguous question about some pip install issue? Or telling a joke? Or re-writing some Java code to Python? Going towards this is not in principle impossible but it's also not trivial and it requires some creative thinking. But whoever convincingly cracks this problem will be able to run actual RL. The kind of RL that led to AlphaGo beating humans in Go. Except this LLM would have a real shot of beating humans in open-domain problem solving.

显示更多

402

8.8K

1.2K

转发到社区

CZ 🔶 BNB@cz_binance

2023.11.21 20:36

Today, I stepped down as CEO of Binance. Admittedly, it was not easy to let go emotionally. But I know it is the right thing to do. I made mistakes, and I must take responsibility. This is best for our community, for Binance, and for myself. Binance is no longer a baby. It is time for me to let it walk and run. I know Binance will continue to grow and excel with the deep bench it has. I’m pleased to announce that @_RichardTeng, our now former Global Head of Regional Markets, has been named the new CEO of Binance today. Richard is a highly qualified leader and, with over three decades of financial services and regulatory experience, he will navigate the company through its next period of growth. He will ensure Binance delivers on our next phase of security, transparency, compliance, and growth. Prior to joining Binance, Richard was CEO of the Financial Services Regulatory Authority at Abu Dhabi Global Market (ADGM); Chief Regulatory Officer of the Singapore Exchange (SGX); and Director of Corporate Finance in the Monetary Authority of Singapore. With Richard and the entire team, I’m confident that the best days for @Binance and the crypto industry lay ahead. As a shareholder and former CEO with historical knowledge of our company, I will remain available to the team to consult as needed, consistent with the framework set out in our U.S. agency resolutions. What’s next for me? I will take a break first. I have not had a single day of real (phone off) break for the last 6 and half years. After that, my current thinking is I will probably do some passive investing, being a minority token/shareholder in startups in areas of blockchain/Web3/DeFi, AI and biotech. I am happy that I will finally have more time to spend looking at DeFi. I can’t see myself being a CEO driving a startup again. I am content being an one-shot (lucky) entrepreneur. Should there be listeners, I may be open to being a coach/mentor to a small number of upcoming entrepreneurs, privately. If for nothing else, I can at least tell them what not to do. On that note, I am proud to point out that in our resolutions with the U.S. agencies they: - do not allege that Binance misappropriated any user funds, and - do not allege that Binance engaged in any market manipulation. Funds are SAFU! With that, I look forward to seeing the new leadership take the reins. Please join me in congratulating Richard on his well-deserved promotion. Onwards! CZ

显示更多

36.2K

145.8K

31K

转发到社区

与「Will_There_Be_a_Next_Time」相关的搜索结果