搜索 mlp 相关的推文与用户

2026.06.11 06:40

Is it possible to make electronics out of food? And why would anyone want to do that? Find out on this week’s “Babbage” podcast

0

1

0

转发到社区

Runpeng Dai@RunpengDai

2026.06.03 04:05

What if we model test time adaptive sampling as MDP? In our recent work, RL-Guided Adaptive sampling, we model the test time sampling as a MDP. Then we train a 4-layer MLP on CPU as controller. This lightweight framework dynamically balances answer correctness, latency, and computation cost only rely on light statistics! 🚀 @zhengtoong @ruiliu0 @ChengsongH31219 @hongtuzhu1 @HongtuZ20093 📄 Paper: 💻 Code:

显示更多

0

5

4

转发到社区

かにかまぁん@kanikama3gou

2026.05.24 07:29

いいよね......

0

39

12K

991

转发到社区

椿りか AV女優@G_Style_rika

2026.05.22 00:03

朝から夢を掴みに来ました

0

12

386

4

转发到社区

Nikola 🤍COMMS OPENED@nikkotari

2026.05.02 19:05

Rarity & Applejack #mlp#

0

6

5.2K

561

转发到社区

𝗔𝗟𝗔𝗡 𝗚 𝗗@MMariyaan

2026.04.15 11:25

It all started with Hermione Granger in Harry Potter… Didn’t know it back then, but that was the beginning.

0

1

2

0

转发到社区

木﨑ゆりあ@yuriaaa_peace

2026.04.11 14:14

#あす卒2026# ついに明日千秋楽です！！！！ぜひ、ご予定空いてる方はすみだパーク倉へ❤️‍🔥 がんばるぞー！！！

0

6

862

82

转发到社区

大西桃香@momo_0x0_920

2026.02.09 15:43

舞台 #リアニ2026# 3日目！今日もまっすぐ木葉ちゃん頑張りました！🍃 明日は休演日！ゆっくり休んで残りの公演も走り続けるぞ💨 本日も来てくださった皆様ありがとうございました！🤍💚

显示更多

0

17

949

99

转发到社区

John Carmack@ID_AA_Carmack

2026.01.24 02:11

#PaperADay# 10 LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics The comments on #PaperADay# 3 recommended this paper as the state of the art JEPA paper, and it does look much better! They acknowledge that much of the prior JEPA research is ad-hoc and full of heuristics, but here they make strong theoretical claims of optimality and provide proofs (which I did not read). The first claim is that isotropic gaussian is the unique optimal embedding distribution for both linear and nonlinear probing, minimizing worst-case risk across downstream tasks. I would have taken that on faith with just a “sounds good to me”, but they go into it with details and examples. Actually getting an isotropic gaussian in high dimensions is easier said than done. They present Sketched Isotropic Gaussian Regularization (SIGReg) as a well behaved loss function to achieve this after analyzing a number of different statistical tests, and they claim it beats the curse of dimensionality with linear scalability. The final loss is just a blend factor to weight the JEPA prediction loss against the SIGReg isotropy loss. This is the one tunable hyperparameter for LeJEPA. Despite the P in JEPA, they don’t use predictor networks here, they just directly compare view embeddings for the JEPA loss. Predictor networks could still be useful for video sequences, especially when conditioned with action information for agents / robots. Each training image is augmented to produce 2 global views and 6 local views with different spatial scales but the same set of color and geometric transformations. The loss is the average MSE between the average of the global view embeddings and each of the local view embeddings. I don’t have a good feel for the tradeoffs in their view transforms, which still seem very much in the ad-hoc space, but they will determine the nature of what gets filtered out of the representation. Learning what doesn’t matter is critical, but the specification of “matters” is only implicit in the view transformations. LeJEPA itself is architecture independent – anything that digests a batch of samples from a dataset into vectors can be used. Vision transformers, MLP, ConvNets, etc. The specific augmentations for views would be input modality specific, but the LeJEPA algorithm could work on audio, images, video, or other things. They show that the LeJEPA loss on a large foundation model is very indicative of downstream task performance, both directly, and with a heuristic to improve the predictive power of the loss farther. They also show that it can be used to train from scratch on small datasets with as few as 1000 samples and achieve better results than probing a conventional general foundation model. I was pleased to see sample code blocks in the paper instead of greek-laden pseudocode, as well as a github repo. Appendix D has interesting details on generating good coverage of unit hyperspheres with low discrepancy samples by transforming Sobol sequences, but this is only for their theoretical analysis, and they show you are better off just making new random hypervectors every batch, with even 16 random vectors outperforming a fixed set of thousands. Some questions: In the discussion of non-linear probing, only kNN and kernel methods are mentioned, presumably for their theoretical analysis tractability, but would an MLP generally perform better? A JEPA embedding is not fully reversible like NICE or a RevNet, so how does it react to inputs that are far outside the training set? Will novel inputs map to unique embeddings, or could they be collapsed onto the codes from the training set? How would the embeddings evolve in a continuous learning environment, as novel inputs are added to the training mix? Can a JEPA be overtrained – is lower training loss always better, or would there be an optimal early stopping point?

显示更多

0

23

311

27

转发到社区

夢彩(ゆあ)@yumeyua0314

2026.01.18 06:44

MLP レインボーダッシュ🌈🎥

0

7

812

79

转发到社区

与「mlp」相关的搜索结果