When issue_width is 3 and there are two threads are running, 
how many instructions can we fetch in one cycle? Is it 3 or 3*2? 
 
It should be the same as issue_width. so 3. 
Can we set MAX_THREAD as some numbers? 
Yes, you can set MAX_THREAD as 5 (for example.) 5 should be good enough. 
Do we have to worry about memory disambiguation problem in this assignment?
No, We assume that there is a perfect memory disambiguation predictor.
What should I use to find out memory instructions? is checking mem_type sufficient or do I have to check opcode? 
if opcode is OP_ST mem_type is MEM_ST and if opcode is OP_LD mem_type is MEM_LD. 
so just checking mem_type itself is sufficient. 
What will be the cache miss penalty? Is this KNOB_DCACHE_HIT_LATENCY+KNOB_MEM_LATENCY or just KNOB_MEM_LATENCY?
 
It is KNOB_DCACHE_HIT_LATENCY+ KNOB_MEM_LATENCY 
Do we need to implement load-store forwarding?
 You could. but you don't have to do. 
Do we need to differentiate load and store?
no, for this assignment, you can treat them equally,