Some tips for implementing G-share
prediction and update function.
useful macros
#define SAT_INC(val, max) ((val) == (max) ? (max) : (val) +1)
#define SAT_DEC(val, max) ((val) == (min) ? (min) : (val) -1)
#define N_BIT_MASK(N) ((0x1ULL << N)) -1
// prediction
pht_index = (old_global_hist ^ pc_addr) & N_BIT_MASK(history_length);
// update the history
new_global_hist = ((old_global_hist) <<1 ) | (op->actually_taken);
// update the counter after the branch is executed
How to implement Out-of-scheduler algorithm
Since this is a timing simulator, you do not really need to all the
hardware in the Tomasulo's data structure. What you need to implement
is enforcing a correct data dependence and make dependent instructions
wait the correct amount of cycle.
Here is one suggestion.
/* map data structure */
class map_entry{
uint64_t inst_id;
bool valid;
uint64_t rdy_cycle;
}
map_entry reg_map[NUM_REG];
/* ID stage */
/* before insertig uops into the scheduler, we have to rename each register */
/* you have to rename source registers first before you do destination register */
/* rename each soure register */
for (ii = 0 ; ii < 2; ii++) {
if (op->src[ii] >=0) {
op->newnames[ii] = reg_map[op->src[ii]].inst_id;
op->src_rdy_cycles[ii] = reg_map[op->src[ii]].rdy_cycle;
}
}
if (op->dst >= 0) {
reg_map[op->dst].inst_id = op->inst_id;
reg_map[op->dst].valid = FALSE;
reg_map[op->dst].rdy_cycle = -1;
}
/* suggested data structures */
/* op structures */
/* you might want to add the following structures inside op structure */
bool done;
int64_t src_rdy_cycles[2];
uint64_t src_newnames[2];
bool in_scheduler;
uint64_t sched_cycle;
uint64_t done_cycle;
/* at execution stage */
int latency = get_op_latency(op);
op->done_cycle = cycle_count + latency;
if (op->exec_done_cycle <= cycle_count) {
if (!op->mem_type) {
op->done = TRUE;
if (op->dst >= 0) broad_cast(op);
}
void borad_cast(Op *op) {
/* set register map */
if(reg_map[op->dst].inst_id == op->inst_id) {
reg_map[op->dst].rdy_cycle = op->done_cycle;
reg_map[op->dst].valid = TRUE;
}
// traverse the scheduler and check all the source ids
// if (source ids are the same as op's inst_id)
then set op->src_rdy_cycles to the current rdy cycles;
}
/* at retire stage */
go through rob structure from the top.
while (retire_count < KNOB_ISSUE_WIDTH.Value()) {
if (op->done) {
// free rob
// free op
retire_count++;
}
else stop;
}
How to implement a scheduler
Scheduler is similar to a reservation station. Instructions are waiting inside the scheduler
while they are waiting for source operands. The by-pass network broadcasts the results into the scheduler.
The processor rename instructions when it inserts instructions inside the scheduler.
You can use "list" structure to implement a scheduler. Scheduler holds a pointer that points rob-entry.
A simplest way of implementing a scheduler is travering a scheduler every cycle.
for example,
while(issue_cout < KNOB_ISSUE_WIDTH.Value()) {
OP *op = scheduler_list[ii++];
if (srcs_rdy(op)) {
issue_count++;
}
else if (!KNOB_OOO_SCHEDULER.Value()) break;
}
// traverse all the scheduler
if (op->done_cycle == cycle_count) {
// pop ops from the scheduler
// reduce the number of schedulers
}
bool srcs_rdy(Op *op) {
}
ROB
ROB is a circulating queue. You can implement a rob data structure
using a queue. You need to check when a ROB is full or not. If
the ROB gets full, the ID stage stalls. You need to have a feature of
push, pop, peek and count the number of free rob entries.