FAQ
How to debug a branch predictor?
You can debug by having a very short GHR (3 bits or 4 bits ) something like that and print out all the PHT entries (8 or 16 entries)
and then check how 2 bit counters and GHR are all updated.
Where do we put the following message?
if (KNOB_DEBUG_PRINT.Value()) {
cout << "ID_STAGE OP " << op->inst_id << " is scheduled at cycle " << cycle_count << endl; }
When we insert ops into the scheduler or when instructions are removed from the scheduler?
When instructions are removed from the scheduler.
Description says that "
Add the following code where an op is scheduled. (an instruction will be executed at the following cycle)
(An in order processor should print out these messages in order.)
"
Will there be any pipeline modifications in the later assignment?
In the 3rd assignment, you will add a memory system which is independent from your current pipeline code. You need to add thread id feature to SMT but that does not require significant changes in your data structure.
I'm not maintaining the valid bit in the register file that you provided for programming assignment #1. Is it OK?
Yes, we will not check any register valid bit for grading.
In a given time, can more than KNOB_ISSUE_WIDTH instructions inside
EX stage? e.g.) KNOB_ISSUE_WIDTH is 3. At cycle 1, 3 instructions are started to be executed with a latency 2. Can the processor execute 3 more instructions at cycle 2?
Because we assume there are enough functional units, yes.
What is the maximum value of KNOB_GHR_LENGTH?
Since 2^(KNOB_GHR_LENGTH)*2 bit will be the size of g-share branch predictor, any branch predictor which would have more than 1MB is very unpractical due to power and space overhead. Hence, you can assume that the maximum GHR_LENGTH is 32 bits.
FE latch already has one instruction and KNOB_ISSUE_WIDTH is 4. Can the processor fetch 3 more instructions or does it stall at that moment?
The processor can fetch 3 more instructions at that cycle.
Rob is full but FE latch has a space. Can the processor fetch more instructions until FE latch is full or should it stop fetching at that moment?
The processor checks rob space at decode/issue stage. Hence, the processor can fetch instructions until FE latch is full.
Can we send more instructions into EX latch or MEM latch then
ISSUE_WIDTH?
Yes, for this assignment, you do not really need to maintain EX latch and MEM latch. But you need to maintain FE latch correctly although we do not check the content of FE latch.
What if more than ISSUE_WIDTH instructions are finished due to different execution latency. Can all instructions broadcast the results?
Yes, we assume there are enough broadcast by-pass network.
Do we have to generate the exact IPC values?
There are some flexibilities in your design choices. Hence, we will give a full credit if your IPC is within a certain range.
When does the processor insert instructions inside the scheduler and when does it remove instructions?
We remove instructions from the scheduler when an instruction finishes its execution.
When an instruction is removed, the processor insert an instruction into the scheduler at the same cycle.
Can we fetch instructions after a branch if a branch is correctly predicted ?
This is a very good question. To simplify the homework,
we assume that the processor fetch instructions after a branch if a branch is correctly predicted at the following cycle.
Of course, the processor should not fetch more than KNOB_ISSUE_WIDTH number of instructions.
If a branch is mispredicted, the processor should not fetch instructions after the mispredicted branch.
In a real hardware, the processor can fetch instructions if a branch is not taken.
Typically the processor brings a cache block so it can fetch instructions from the same cache block.
However, we do not model that behavior in the simulator. If a processor has very aggressive I-cache or trace-cache mechanism, it can fetch instructions across branches.
How can we know a branch's direction?
Op->actually_taken (1:taken 0: not taken)
Will you check pipeline latches to grade our homework?
No, this time we do not check pipeline latches. So you can modify pipeline latches or you do not have to use them if you prefer other data structures.
When does a store instruction actually write a value to the memory system? The Ex stage or the WB stage?
Store instructions can change architectural states. So, a processor must send the store value into the memory system when an instruction is ready to retire. Hence, we model such that a store instruction write a result in the WB stage. However, the store instruction should check data cache or miss in the EX stage. Hence, as a simulator's view point, both load and store instructions are equally handled in the EX stage. We do not add additional timing delay for store instruction in the commit stage. Note that
in the programming assignment #2, all dcache accesses are cache hit and in the programming assignment #3, you will implement the cache and memory system.
Do we need to check memory dependences?
For the assignments, we assume that we have a perfect memory dependence predictor.
When should the processor update the register file?
It should update the register file at the commit stage. Note that there are data forwarding logic in the pipeline. We do not check the register valid bits to grade.
Do we need to collect control hazard and data hazard for this assignment?
No, we will not use these stats for grading.