国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

ECE 498代寫、代做Python設計編程
ECE 498代寫、代做Python設計編程

時間:2024-11-15  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機打開當前頁
  • 上一篇:IEMS5731代做、代寫java設計編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務 管路流場仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務 管路
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真技術服務
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真
    結構仿真分析服務_CAE代做咨詢外包_剛強度疲勞振動
    結構仿真分析服務_CAE代做咨詢外包_剛強度疲
    流體cfd仿真分析服務 7類仿真分析代做服務40個行業
    流體cfd仿真分析服務 7類仿真分析代做服務4
    超全面的拼多多電商運營技巧,多多開團助手,多多出評軟件徽y1698861
    超全面的拼多多電商運營技巧,多多開團助手
    CAE有限元仿真分析團隊,2026仿真代做咨詢服務平臺
    CAE有限元仿真分析團隊,2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗證碼 寵物飼養 十大衛浴品牌排行 suno 豆包網頁版入口 wps 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    成人国产精品色哟哟| 色伦专区97中文字幕| 91高清免费视频| 欧美激情视频三区| 国产中文欧美精品| 国产精品色婷婷视频| 欧美综合激情网| 色阁综合伊人av| 日本www在线播放| 久无码久无码av无码| 亚洲一区二区中文字幕| 成人免费视频a| 亚洲欧美影院| 91成人福利在线| 一区二区不卡在线视频 午夜欧美不卡' | 日本高清一区| 久久国产精品网| 欧美一区二区影院| 国产精品吊钟奶在线| 国产一区二区丝袜| 欧美极品在线视频| 91九色视频在线| 欧美一级片久久久久久久| 久久九九国产视频| 人人妻人人添人人爽欧美一区| 俺去了亚洲欧美日韩| 免费一级特黄毛片| 精品中文字幕在线| 97精品在线观看| 日韩欧美一区三区| 国产精品免费久久久| 国产欧美精品在线| 午夜精品一区二区三区在线视频| 久久精品国产sm调教网站演员| 秋霞在线观看一区二区三区| 国产精品久久久久久久久婷婷| 国产欧美一区二区三区久久人妖| 亚洲一区二区久久久久久| 久久综合精品一区| 欧美在线一区二区三区四| 国产精品久久久久久久久久尿| 国产欧美一区二区| 日韩一区二区高清视频| 国产精品欧美激情在线观看| yy111111少妇影院日韩夜片| 日韩欧美一区二区三区四区五区| 国产精品免费一区二区三区四区 | 少妇大叫太大太粗太爽了a片小说| 色偷偷888欧美精品久久久| 狠狠色噜噜狠狠狠狠色吗综合| 色综合久久88色综合天天看泰| 久久伊人一区二区| 免费在线观看日韩视频| 亚洲一区二区三区香蕉| 日韩中文字幕网址| 蜜桃成人免费视频| 午夜精品视频在线观看一区二区 | 高清国产在线一区| 日韩免费黄色av| 欧美日韩国产成人在线| 久久精品.com| 国产欧美一区二区三区久久| 日韩欧美亚洲在线| 色综合久久88| 精品久久国产精品| 国产精品专区第二| 欧洲精品久久| 午夜欧美性电影| 欧美成在线观看| 久久久精品有限公司| 国产一级做a爰片久久毛片男| 日本一区网站| 中文字幕色呦呦| 国产精品视频免费一区二区三区| 91精品在线国产| 国内精品国产三级国产在线专| 熟女少妇在线视频播放| 国产99久久九九精品无码| 久久久精品美女| 国产福利视频在线播放| 国产无套内射久久久国产| 日本不卡一二三区| 亚洲欧美日韩精品综合在线观看| 国产精品福利久久久| 久艹在线免费观看| 91av免费看| 国产精品一区二区不卡视频| 欧美日韩亚洲国产成人| 日韩 欧美 高清| 亚洲一区高清| 在线视频一区观看| 久久国产精品免费视频| 国产精品极品美女粉嫩高清在线| 久久久久久久97| 国产成人短视频| 91久久精品美女| 国产精品一久久香蕉国产线看观看| 黄色大片中文字幕| 欧美日韩天天操| 欧洲亚洲在线视频| 日本午夜一区二区三区| 午夜精品一区二区三区在线播放| 中文字幕色一区二区| 精品久久久久久一区| 国产精品美女久久久久av福利| 色噜噜狠狠色综合网图区| 国产成人亚洲综合91精品| 久久久免费视频网站| 97久久精品人人澡人人爽缅北| 国产久一道中文一区| 国产在线999| 国产综合在线视频| 国产一区二区三区黄| 国产一区二区久久久| 国产欧美综合精品一区二区| 国产天堂视频在线观看| 国产欧美日韩综合一区在线观看| 国产日韩精品一区观看| 国产欧美精品xxxx另类| 国产精品揄拍一区二区| 国产精品一区二区女厕厕| av免费精品一区二区三区| 91免费看国产| 国产va免费精品高清在线| 日韩在线一区二区三区免费视频| 久久久久久久久久久人体| 日韩在线视频线视频免费网站| 日韩在线观看精品| 国产精品久久久久久五月尺 | 好吊色欧美一区二区三区视频| 僵尸世界大战2 在线播放| 国模精品一区二区三区色天香| 精品一区二区日本| 国产日韩一区在线| 成人国产在线看| 国产精品69av| 久久www视频| 国产成人免费观看| 操人视频在线观看欧美| 欧美精品激情在线观看| 亚洲三区在线| 日本www在线视频| 激情成人开心网| 国产乱人伦真实精品视频| 久久久人成影片一区二区三区观看| 久艹视频在线免费观看| 国产精品久久久| 亚洲一区二区久久久久久久| 日韩精品久久一区二区| 霍思燕三级露全乳照| 成人欧美一区二区三区黑人免费| 久久免费少妇高潮久久精品99| 色偷偷888欧美精品久久久| 国产精品乱子乱xxxx| 亚洲一区二区三区精品视频| 人妻无码视频一区二区三区| 国产日韩一区二区在线| 国产高清精品软男同| 国产精品久久国产| 午夜精品在线观看| 免费在线精品视频| 91久热免费在线视频| 国产精品视频99| 亚洲精品欧美日韩专区| 国内视频一区二区| 国产精品10p综合二区| 国产精品久久久久久久7电影| 亚洲一区二区中文字幕| 欧美 日韩 国产在线观看| 成人精品一区二区三区| 久久精品久久久久久| 亚洲欧洲日韩综合二区| 欧美激情国产日韩| 91精品国产综合久久久久久丝袜| 日韩视频免费大全中文字幕| 在线一区亚洲| 欧美国产一区二区在线| 91精品视频专区| 美女视频久久黄| 欧美一级二级三级九九九| 91精品国产综合久久久久久久久| 国产精品国产亚洲精品看不卡| 午夜精品免费视频| 国产又粗又爽又黄的视频| 久久久久久久久国产| 亚洲精品一区二区三区av | 成人免费在线一区二区三区| 久久精品国产成人| 亚洲精品一区二区三区蜜桃久| 国内精品视频久久| 久久riav| 午夜精品久久久99热福利| 国产精品一区av| 欧美成年人视频| 欧美日韩国产综合视频在线| 91久久国产婷婷一区二区| 色综合久久久888| 免费99视频| 国产精品视频免费一区二区三区 | 99电影网电视剧在线观看|