注册并分享邀请链接,可获得视频播放与邀请奖励。

Wei Cai 🎮 的个人资料封面
Wei Cai 🎮 的头像

Wei Cai 🎮 (@weicaiuw)

@weicaiuw
css assistant professor @uw; decentralized computing (crypto + DeAI) researcher @SIGCHI @SIGMM @ACM_SIGWEB; indie game dev @HuluCatsGames
767 正在关注    3K 粉丝
看到妻子被带走的雄蟹,为了保护妻子,从藏身处出来拥抱她。察觉到这一情况的渔夫,在祝贺雄蟹后,赠送了一条鱼然后离开。
0
236
2.7K
207
转发到社区
now it starts
Having a midnight stroll in the hospital courtyard alone when a panic attack kicks in. What's next? #indiegame# #gamdev# #horrorgame# #indiehorror#
Benchmark
BenchLocal v0.2.5 is out! > The big one: repeated test runs with majority voting (1, 3, 5, 7, or 9 runs per test). > Plus error classification, retry actions, per-scenario timings & more.
显示更多
time to switch to @UnslothAI ?
2.3x faster. Ran @UnslothAI Qwen3.6 MTP variants on a DGX Spark (UD-Q6_K_XL): > 27B → 27B MTP: 8.1 → 18.65 t/s (2.3x faster) > 35B A3B → 35B A3B MTP: 56.91 → 66.52 t/s (+17%) The 27B dense model more than doubled throughput from MTP alone. Free speed is free speed.
显示更多
layered llm?
In this paper, a 7B language model trained with reinforcement learning learns to orchestrate larger frontier models like GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. It does so by writing natural-language subtasks, assigning each to one of the workers, and specifying which previous outputs that worker sees in context. The resulting system outperforms every individual frontier model on benchmarks including GPQA Diamond, LiveCodeBench, and AIME25, while averaging about three model calls per question—fewer than the multi-agent pipelines and self-reflection loops it beats. The work provides evidence that prompt engineering and pipeline design, currently done by hand in commercial AI products, can be learned end-to-end through reward signals alone. Read with an AI tutor: PDF:
显示更多
Is Web the natural way of AI communication?
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
显示更多
tips for local model beginners
If you love fine-tuning open-source models (like me), then listen. > Start with 1B, 2B, 4B, and 8B models. (Don't start with a 27B model or bigger at first.) > Use WebGPU providers. I use Google Colab Pro for any model smaller than 9B. A single A100 80GB costs around $0.60/hr, which is cheap. Enough for small models. > Don’t buy GPUs unless you fine-tune 7 to 10 models. You'll understand the nitty-gritty in the process. > Use Codex 5.5 × DeepSeek v4 Pro to create datasets. Codex to plan, DeepSeek v4 Pro to generate rows. > Use Unsloth's instruct models as a base from Hugging Face. Yes, there are others too, but Unsloth also provides fast fine-tuning notebooks. > Use Unsloth's fine-tuning notebooks as a reference. Paste them into Codex, and Codex will write a custom notebook with the configs you need. > Spend 1 day learning about: - SFT (supervised fine-tuning) - RL training (GRPO, DPO, PPO, etc.) - LoRA / QLoRA training - Quantization and types - Local inference engines (llama.cpp) - KV cache and prompt cache > Just get started. Claude, Codex, and ChatGPT can design a step-by-step plan for how you can fine-tune your first AI model. Future tech is moving toward small 5B to 15B ELMs (Expert Language Models) rather than general 1T LLMs. So fine-tuning is an important skill that anyone can acquire today. Tune models, test them, use them. Then fine-tune for companies and make a career out of it. (Companies pay $50k+ to fine-tune models on their data so they can get personalized AI models.) Shoot your questions below. I'll be sharing in-depth raw findings about this topic in the coming days.
显示更多
We're early in the AI boom
Local AI is having its moment! Below is the number of new GGUF models created each month over the past 8 months & insights from our HF internal agent (May is partial): - 176,000 total public GGUF models on HF - Two distinct regimes: Oct–Feb averaged ~5.1K new GGUF models/month. Then March–April jumped to ~9.2K/month — nearly double the previous rate. - March was the inflection point (+55% MoM) — likely driven by a wave of new open-weight model releases being quantized to GGUF. - April sustained the momentum at 9.7K, suggesting this isn't a one-off spike but a new baseline. - The GGUF ecosystem is accelerating — the community is quantizing models faster than ever, likely thanks to better tooling (llama.cpp improvements, automated quantization pipelines, and more models supporting GGUF natively). Let's go!
显示更多
I just know that vending machine sells Canada Dry.
The resting area in the hospital courtyard🌛 No one's chilling here today❄️ Should we add a bit of dust in the air...? #indiegame# #gamedev# #horrorgame# #indiehorror#
now 1-3 months
llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on the project and the state of AI from the perspective of local applications. There is a lot to say and discuss and yet it feels less and less important to try to make a point. Opinions about viability of local LLMs are strongly polarized, details are overlooked, the scientific approach is lacking. Arguments are predominantly based on vibes and hype waves. One thing is clear though - local LLMs are used more and more. I expect this trend to continue and likely 2026 will end up being one of the most important years for the local AI movement. I admit that I didn't expect the agentic era to come so quickly to the local LLM space. One year ago, the available models were too computationally expensive for doing long-context tasks. There wasn't an obvious path towards meaningful agentic applications. The memory and compute requirements were huge. Last summer, with the release of gpt-oss, things started to change. It was the first time we saw a glimpse of tool calling that actually works well within the resource constraints of our daily devices. Later in the year, even better models were released and by now, useful local agentic workflows are a reality. Comparing local vs hosted capabilities at a given moment of time is pointless. To try put things into perspective: - We don't need frontier intelligence to automate searches and sending emails - We don't need trillion parameter models to be able to summarize articles or technical documents - We don't need massive GPU data centers to control our home appliances or turn the lights off in the garage I believe that there is a certain level of intelligence we as humans can comprehend and meaningfully utilize to improve our working process. Beyond that level, access to more intelligence becomes unnecessary at best and counterproductive at worst. I also believe that that level of useful artificial intelligence is completely within reach locally and it has always been just a matter of implementing the right software stack to bring it to the end user. With llama.cpp, I am confident that we continue to be on the right track of building that software stack! The llama.cpp project is going stronger than ever. With more than 1500 contributors, the project keeps growing steadily. From technical point of view, I think that llama.cpp + ggml is the only solution that actually makes sense. That is, the software stack must run efficiently on every possible device, hardware and operating system. The technology is too important to be vendor-locked. It has to be developed in the open, by the community, together with the independent hardware vendors. This is the only right way to build something that will truly make a difference in the long run. I won't try to convince you about what is currently and will be possible with local AI. We will just continue to build as usual. I am confident that after the smoke clears and we look objectively at what we have built together, the benefits will be obvious to everyone. Big shoutout to all llama.cpp maintainers. I feel extremely lucky to be able to work together with so many talented contributors. Every day I learn something new and I feel there is so much more cool stuff that we are going to build. Also, I am really thankful that the project continues to have reliable partners to support it! Cheers!
显示更多
Something is coming...
Patient 106. A gigantic moon. The night train. Something’s coming... - Got the lighting & fog right in the outdoor scene. We'll update the Steam store page of Shifting Lunacy today, wave goodbye to early baby screenshots🔥🩶. #indiegame# #gamdev# #horrorgame# #indiehorror#
显示更多
Treatment Time! Which patient should we cure today? 💉💗 Shifting Lunacy - Wishlist on Steam for upcoming demo🖤 #indiegame# #gamedev# #horrorgame# #indiehorror#