Q6) Is there a concept of port?
    - Kind of. We can use aggregates as function arguments.

  Q5) Is it possible to make multi-cycle operations?
    - Like multi-cycle multiplies and divides? I do have a few designs for those
      built using CHDL (you'd rarely want to use the CHDL built-in divide) but
      they're not in the standard library.

  Q7) How to understand the printed message to understand critical path.
    - Not mentioned yet, what "critpath" means. Maybe do a tutorial on
      performance optimization.

  Q8) How to set up a clock cycle?
    - As in a clock domain? (not relevant to this discussion)
    - As in a cycle time? (not currently relevant; we use clock cycles as the
      basic unit of time in the simulator; when we include it with SST, we
      specify the cycle time in the component configuration)

  Q1) how to do memory read and write operations
    - On SRAM? Covered in today's slides; using Syncmem.
    - On external memory? That depends on the interface of the particular
      external memory.

  Q2) How to configure memory (address bits and data size)?
    - Based on the size of the inputs to the syncmem (template parameters).

  Q3) Is there any way to display memory values? 
    - Not yet. Working on it as part of a general memory API refactoring.

  Q9) SRAM and DRAM// 
    - All internal CHDL RAM is SRAM. All RAM (that we use) is synchronous. DRAM
      is always off-chip, accessed through I/O.

  (1) when to use nodes vs. bit vectors and how to convert them each other 
    - Nodes are just single bits. Bit vectors are just vectors of nodes.
    - To get a node from a bvec, you select one using either a Mux (dynamic) or
      [] (static).
    - To get a bvec<1> from a node n, use bvec<1> x(n). Passing a node to a
      bvec's constructor initializes every element in that bvec to the given
      node.
    - To do all-reduce Ands, Ors, and Xors of a bvec x, use AndN(x), OrN(x), or
      XorN(x) respectively.

  (2) How to add prefix or suffix to TAP variables.
    - Use the lower-case tap function instead of the macro:
      tap("prefix_x", x);

  (3) The usage of hierarchy
    - Hierarchy is useful in debugging (especially performance debugging).
    - Just call HIERARCHY_ENTER() at the start of every function and
      HIERARCHY_EXIT() at the end (just prior to return). All of the nodes will
      be inserted into an appropriate place in the call graph. This lets you see
      just where each step on your critical path was defined.

  (4) how to increase the data width from the demo code? I faced a memory
      initialization problem.
    - You should just be able to increase the value in "WORD_SZ".
    - Oh. But that makes the LLRom for the instruction memory prohibitively
      large. Let's truncate that down to IMEM_SZ:

    // Instruction memory; we use a Harvard architecture. 
    inst_t InstMem(const word_t &a) {
      return LLRom<IMEM_SZ, INST_SZ>(Zext<IMEM_SZ>(a), "demo.hex");
    }

    - Now this can scale to arbitrary word sizes without crashing. This was
      tried with 27-bit words.

  (5) How to initialize register values. Is there a concept of class and
      instantiation?
    - Register initialization is pretty shaky currently. You can pass numbers to
      bvec registers and pass booleans to ordinary registers, but that's about
      it.

  Q4) Pipeline latch designs
  (6) Pipeline latches, I'd like to use the same structure/functions but the
      latch sizes care different for different pipeline stages.  How can we do
      that?
    - Using aggregate types. Then your pipeline register becomes either the
      stock Wreg (next_stage = Wreg(!stall, this_stage) or a user-defined
      template PipelineReg function with support for instruction cancelling,
      etc. :

      template <typename T>
        T PipelineReg(node clear, node stall, const T &in)
      {
        HIERARCHY_ENTER();
        bvec<T::sz::val> d(Flatten(in)), q, zero(Lit(0));
    
        q = Wreg(!stall || clear, Mux(clear, d, zero));
        HIERARCHY_EXIT();

        return q;  
      }

  Note that these are full edge-triggered registers, not level-triggered
  latches. Latch-based pipelining is a wonderful idea, but the dual assumptions
  that CHDL thrives on are that it is a simple retiming problem to split
  pipeline registers into pairs of pipeline latches and that designers prefer to
  think in terms of clock cycles during the RTL phase.