FAQ
How to debug a branch predictor?
You can debug by having a very short GHR (3 bits or 4 bits ) something like that and print out all the PHT entries (8 or 16 entries)
and then check how 2 bit counters and GHR are all updated.
Where do we put the following message?
if (KNOB_DEBUG_PRINT.Value()) {
cout << "ID_STAGE OP " << op->inst_id << " is scheduled at cycle " << cycle_count << endl; }
When we insert ops into the scheduler or when instructions are removed from the scheduler?
When instructions are removed from the scheduler.
Description says that "
Add the following code where an op is scheduled. (an instruction will be executed at the following cycle)
(An in order processor should print out these messages in order.)
"
Will there be any pipeline modifications in the later assignment?
In the 3rd assignment, you will add a memory system which is independent from your current pipeline code. You need to add thread id feature to SMT but that does not require significant changes in your data structure.
I'm not maintaining the valid bit in the register file that you provided for programming assignment #1. Is it OK?
Yes, we will not check any register valid bit for grading.
In a given time, can more than KNOB_ISSUE_WIDTH instructions inside
EX stage? e.g.) KNOB_ISSUE_WIDTH is 3. At cycle 1, 3 instructions are started to be executed with a latency 2. Can the processor execute 3 more instructions at cycle 2?
Because we assume there are enough functional units, yes.
What is the maximum value of KNOB_GHR_LENGTH?
Since 2^(KNOB_GHR_LENGTH)*2 bit will be the size of g-share branch predictor, any branch predictor which would have more than 1MB is very unpractical due to power and space overhead. Hence, you can assume that the maximum GHR_LENGTH is 32 bits.
FE latch already has one instruction and KNOB_ISSUE_WIDTH is 4. Can the processor fetch 3 more instructions or does it stall at that moment?
The processor can fetch 3 more instructions at that cycle.
Rob is full but FE latch has a space. Can the processor fetch more instructions until FE latch is full or should it stop fetching at that moment?
The processor checks rob space at decode/issue stage. Hence, the processor can fetch instructions until FE latch is full.
Can we send more instructions into EX latch or MEM latch then
ISSUE_WIDTH?
Yes, for this assignment, you do not really need to maintain EX latch and MEM latch. But you need to maintain FE latch correctly although we do not check the content of FE latch.
What if more than ISSUE_WIDTH instructions are finished due to different execution latency. Can all instructions broadcast the results?
Yes, we assume there are enough broadcast by-pass network.
Do we have to generate the exact IPC values?
There are some flexibilities in your design choices. Hence, we will give a full credit if your IPC is within a certain range.
When does the processor insert instructions inside the scheduler and when does it remove instructions?
We remove instructions from the scheduler when an instruction is start to be executed.
(i.e., the next cycle all the sources are ready.)
As soon as the instruction is removed, we assume that we can insert instructions into the scheduler.
Can we fetch instructions after a branch if a branch is correctly predicted ?
This is a very good question. To simplify the homework,
we assume that the processor can fetch instructions after a branch if a branch is correctly predicted.
Of course, the processor should not fetch more than KNOB_ISSUE_WIDTH number of instructions.
If a branch is mispredicted, the processor should not fetch instructions after the mispredicted branch.
In a real hardware, the processor can fetch instructions if a branch is not taken.
Typically the processor brings a cache block so it can fetch instructions from the same cache block.
However, we do not model that behavior in the simulator. If a processor has very aggressive I-cache or trace-cache mechanism, it can fetch instructions across branches.
what should be the init value for 2-bit counters in the gshare predictor?
Please set the value to weakly taken.
How can we know a branch's direction?
Op->actually_taken (1:taken 0: not taken)
Do we limit the number of physical register?
We assume that the number of physical register is the double of the number of ROB entries. Hence, there is no case that a processor
cannot allocate a physical register.
We use a branch predictor for conditional branches. How about other control flow instructions?
We assume that all other control flow instructions are correctly predicted with other predictor. We just do not model other branch predictor. The simulator simply fetches next correct instructions for other types of control flow instructions.
When do we know an instruction is a branch or not? at Decode stage or at Fetch stage? Do we have to model a BTB?
We assume that there is a BTB (you do not need to model a BTB.). So we know whether an instruction is a branch or not at FE stage.
Will you check pipeline latches to grade our homework?
No, this time we do not check pipeline latches. So you can modify pipeline latches or you do not have to use them if you prefer other data structures.
When does a store instruction actually write a value to the memory system? The MEM stage or the WB stage?
Store instructions can change architectural states. So, a processor must send the store value into the memory system when an instruction is ready to retire. Hence, we model such that a store instruction write a result in the WB stage. However, the store instruction should check data cache or miss in the MEM stage. Hence, as a simulator's view point, both load and store instructions are equally handled in the MEM stage. We do not add additional timing delay for store instruction in the commit stage. Note that
in the programming assignment #2, all dcache accesses are cache hit and in the programming assignment #3, you will implement the cache and memory system.
Do we need to check memory dependences?
For the assignments, we ignore memory dependences. We simply assume that there is no memory dependencies.
When should the processor update the register file?
It should update the register file at the commit stage. Note that there are data forwarding logic in the pipeline. We do not check the register valid bits to grade.
Do we need to collect control hazard and data hazard for this assignment?
No, we will not use these stats for grading.