DescriptionModern clock tree synthesis (CTS) algorithms take advantage of useful skew to improve setup time performance by increasing the insertion delay to the capture flop on a setup-critical path. This improves setup total-negative slack at the cost of additional buffering on the clock network. However, these useful skew CTS engines do not simultaneously skew to minimize hold negative slack; hold is fixed in the conventional manner of adding delay buffers at the last stage of the design flow, when area for these buffers is scarce. This presentation will illustrate a novel approach whereby an exhaustive path depth analysis is performed on a post-routed netlist to identify launching flops with little functional (non-buffer) logic on their transitive fanout, but with a large number of hold buffers in this same fanout. Based on the post-signoff clock arrival times at these launch/capture flops, positive skew adjustments are computed for the target launching flops. By re-running the CTS and routing portions of the backend flow with these increased latency targets on hold-critical flops, the hold buffer count can be reduced. Using a commercial physical implementation flow, IP processor core, and cell library, results show a reduction in hold buffer count of up to 35%.