qiqi_hua

3操作1 ^#

职场历史故事：Leetcode

占个位置。

根据我目前的进度，应该每3天一个小note，每星期一个topic summary。

和文史有关，绝对有关。职场历史也是历史，不能歧视！版主四马是拦不住我的！

kengdie

0操作2 ^#

给楼主点赞

qiqi_hua

0操作3 ^#

https://forums.huaren.us/showtopic.html?topicid=2573159&fid=398&page=30

从这里开始。Leetcode读书笔记。

本来是发chats；不过，有点不合适。放这里吧。

tidewater

0操作4 ^#

友情支持

qiqi_hua

0操作5 ^#

友情支持

tidewater 发表于 2020-07-31 00:06

谢潮水。军功章有你的一半

你在原帖的评论我会汇总加过来。现在正在和concurrency的leetcode奋斗，tree的搞定，这个周末写notes。

qiqi_hua

0操作6 ^#

我的第二份工作，是在Linux kernel mode device driver。大系统里面，16 computing blades，每个blades上面4个CPU complex，每个CPU若干个core，还有hyper thread。每个thread齐头并进，有时候需要lock。一个CPU里面的lock，用semephore/mutex可以解决；不同blades之间的lock，必须靠硬件支持的atomic transaction。这个特殊硬件需要的device driver，就是我们小组负责。

今天的leetcode，随便看了concurrency的topic，做了几个题目；refresh memory也学新东西。潮水说的，c++17。那就从这几个lock开始吧。（有的是c++11）。说错了由潮水老师补充（）

lock_handle
unique_lock
scope_lock

lock_handle(mutex): 一直blocking，直到mutex available。不需要unlock。从当前程序/section{} 退出的时候，自动release。工作中，我一般用这个。

unique_lock（）：支持unlock。比lock_handle 灵活。用condition_variable的时候，必须用它。condition_variable（unique_handle, condition）相当于”only when condition is true, do we try to lock"。这样可以直接把checking写入lock。当程序从这里退出，condition is true，plus the lock is obtained. 当然必须记住：需要unlock later。

scope_lock: 属于c++17。对于多个mutex一起lock。如果用以前的stand_lock，需要考虑multi-lock的顺序，极其容易deadlock。这个scope_lock很方便。

常见的用法，当然是multi-thread 情况了。

https://en.cppreference.com/w/cpp/thread/scoped_lock

几个points：

lock_guard wrapped in {}, only effective inside this code section.
scoped_lock works with multiple locks, grabbing them atomically.
threads use emplace_back rather than push_back. This is a performance improvement, reduced from construction & copy into in-place construction.

qiqi_hua

1操作7 ^#

Graph Theory

常见的Tree，属于Graph的一个特殊分类。Tree在leetcode里面有tag，可以专门挑tree类的题目做。

Grid，尤其是2D grid，也是Graph的一个表达方式。扫地机器人的navigation，一般用Grid形式表达map。

Navigation算法里面，A*（和它的变种，Dixxxx）是weighted navigation的普遍算法。无人驾驶的navigation，或者circuit自动连线的算法，都是A*。A* 是 Dijkastra 的加速版，加上 heuristic cost。

Robotics 里很多是用 hybrid A*，因为 quasi continuously differentiable 的要求，law of physics。

Mapping算法里面，RRT是Robotics领域经常用的。RPM（PRM？）用的也多。

我工作的方向是这个，所以我以为我对graph theory和里面的题目应该没有问题；结果，，，我错了！

加上一个Youtube link，里面的解释不错。

[url]https://www.youtube.com/watch?v=DgXR2OWQnLc&list=PLDV1Zeh2NRsDGO4--qE8yH72HFL1Km93P[/url]

qiqi_hua

0操作8 ^#

如果要去control 组，A*只怕是必考；去network组，minmax flow估计是必考；去path planning，那就是TSP，traveling of a salesman.

潮水如果以后想走Robotics，告诉你，Jacobian Matrix那就是必考必考题！还有，空间旋转的4个表达方式。

leetcode pace, 差不多20个easy 一天，10个medium勉强。

qiqi_hua

0操作9 ^#

Summary 算法的用途 scenarios：

DFS 的用处，是finding connection。例如，find center (the nodes with the most levels); find weakest link (same as finding isolated islands).
BFS: is about to find path.

潮水哥说： BFS / DFS / Best-First-Search 其实可以用同一种写法，差别只是 DFS 用 LIFO ，BFS 用 FIFO，Best-First-Search 用 cost-based-heap (priority queue) 。。。其它部分写法一样一招鲜吃遍天。

不过工业界里，大尺寸或者 High performance low latency 的 DFS，一般也避免写成递归函数，因为避免递归的 overhead 以及系统栈的尺寸限制。

Common data structures for Graph representation:

unordered_map<int, vector<int>>: the benefit is the key part can be used directly as index to the map
vector<vector<int>>: representing individual edge, typically requiring to translate this into unordered_map
vector<vector<int>> as a matrix[n.m]: huge waste of memory, but typically used for grid situation.
TreeNode/left/right: most commonly used in leetcode

潮水哥又说： std::push_heap() 是 O(log(N))

std::sort() 是 O(N*log(N))

除非 problem size 很小，或者 implementation 有问题，否则 heap (priority queue 的底层实现) 应该更快。

你如果做工业界的 high-performance / low-latency programming, 看具体应用，有些应用要避免用 std::priority_queue，考虑直接用 std::push_heap() / std::pop_heap() 在 std::vector 上，因为这样可以事前 std::vector::reserve() 以及避免 vector realloc ... 还有就是可以保持 std::vector allocated space persistent 来避免 loading time.

另外 std:: priority_queue 的底层也可以是 std::deque。。。但 std::deque 对于大尺度问题，RandomAccessIterator 会有附加 overhead 。

对于 pre-sorted but algorithm do not know it is pre-sorted，std::sort() 会快一些，但咋都比不过 std::push_heap() 否则香浓的 paper 药丸。。。

当然 heap 的最大价值在于 std::push_heap() 和 std::pop_heap() 对称美。。。如果你的应用都是 batch push 一次给我来一百打然后我一个一个 pop 的这种比较极端情况，另说。

qiqi_hua

0操作10 ^#

Links

1. https://leetcode.com/explore/interview/card/top-interview-questions-medium/

2. https://leetcode.com/discuss/career/456930/must-do-arrays-questions-for-tech-interviews

3. Search key words for topics

a. Dynamic programming 209

b. Graph 44

c. Tree 140

d. Network flow

e. Hash table 129

qiqi_hua

0操作11 ^#

Graph: Topology

Topology algorithms are pretty much all about DFS post-order (reaching to the leaf nodes, processing children before processing current node), aiming to identify structures, depths, connections of graphs.

Typical use scenarios include:

· Finding shortest path

· Finding distances along

· Comparing of graph structure

Isomorphism

AHU algorithm uses “(“ and “)” to represent the structure (NOT the value) of tree nodes into a single string.

Sequence of operation

1. Find “centers” of graph structure. Note that there can be multiple centers of the same graph.

a. “center” means “the shortest path to all leaf nodes”

2. Encode with DFS post order. Note that the child’s returned string shall be sorted lexicographically (with sort(stringV.begin(), stringV.end())

3. Merge children’ string, wrap with “(“ and “)”

4. For two trees, each of them can have multiple “centers”. Pick a center from one tree, but compare it with the other tree’s AHU strings with all centers iterated.

LCA: least common ancestor

There are a number of algorithms, including Tarjan’s offline. Below is an Eulerian tour/RMQ query example.

Eulerian tour

The key part is to build two arrays, size N = 2 * number of nodes + 1. This is because all nodes are visited twice other than the root (ASSUMING we have a left/right tree)

· One array represents the depth of each node, which correlates to ancestor in parent/child relationship

o The index of each element is the visit order to this node

o The value of each array element is the depth

· One array represents the sequence of visit, which represents the order of traversing all nodes

o The index of each element is the visit order to this node, same as the depth queue.

o The value of each element is node unique ID.

· The requirement is to index each node with unique ID.

Sequence of operation

1. Build depth & order arrays

a. DFS pre-order AND post-order. This means the current node’s ID is appended to both DEPTH_ARRAY & VISIT_SEQ_ARRAY before & after EACH child.

2. Cross reference

a. Use unique node ID to find index from visit array. There can be multiple entries for each node since one single node can be visited multiple times

b. Find min/max of the visit index

c. Use visit index region in DEPTH_ARRAY and find min_level

d. cross reference back from min_depth->visit order->node ID, which is the LCA node.

qiqi_hua

0操作12 ^#

Topological sort

This is very useful to determine dependencies for events/programs/courses. Node parent->child edge can be modeled to present logical relationship.

The key note: topological orders are not unique!

Tree (in format of TreeNode) can use “cherry pick leaf nodes” (BFS into queue, and reserve order); graph (in connected edge format) can use DFS with a “visited” array. Note that not all nodes are connected with each other in graph, and thus this is an iterative process until all nodes are visited.

1. Initialize a VISITED_ARRAY (size == N)

2. Initialize a ORDER_ARRAY (size == N) and each element value is node’s ID

3. Pick any node as START_NODE

4. DFS post-order from START_NODE and save to a temp ordering array

5. Copy temp ordering array to ORDER_ARRAY

6. If there are unvisited nodes, repeat from step 3

Leetcode

630/207/210/1462

qiqi_hua

0操作13 ^#

值得注意的地方：

1。考虑到graph的direction。如果是directional，生成edgeMap的时候，from parent to child。如果是non-directional，edgeMap加上parent to child, and child to parent.

2。考虑到graph可能包含cyclic dependency，用visited（vector<bool>， pre mark) 代表“a node is in the path before processing the current node", 用inList(vector<bool>, post mark)代表”a node has been marked as a child & already put to topOrderList"。如果只用visited，会导致stack overflow.

3。有些问题是找到2个nodes之间是否有联系。这时候就不能用topOrder 表示结构关系，用connectionMatrix代表reachability合适。

qiqi_hua

0操作14 ^#

Shortest Path for DAG

The general solution for various SSSP issue is to keep an array MIN_DISTANCE_ARRAY (size == N) and run iteratively from any nodes to “relax” the distance to destination node.

· Might also need to allocate a correspondent array for PREV_NODE

It is almost a “dynamic programming” model, since the distance to a node is “referred” from previously identified shorted path.

1. Push the “start node distance == 0” into the array

2. Loop from index = 0 for all nodes

3. Get the edge cost from the current node to its child nodes

a. “check and relax” the recorded distance in MIN_DISTANCE_ARRAY

4. Find the next node (the first one picked is always the start node)

a. It varies by algorithm which “next node” to choose

5. Mark the node as VISITED once it’s picked by step 3. We don’t come to this node again, and the value of this node in MIN_DISTANCE_ARRAY is considered final.

qiqi_hua

0操作15 ^#

SSSP based on Topological sort

With the ORDER_ARRAY by hand, it is easy to find the shorted path because we always know which node to pick in the next step. Therefore, step 4.a is simply to use the next node in ORDER_ARRAY

Note that this can be used to solve LONGEST path (NP hard) by inverting all weights with -1.

qiqi_hua

0操作16 ^#

Dijkstra

The requirement is NOT to contain a negative edge; or, the algorithm loops forever.

Since there is no already-made ORDER_ARRAY, a priority_queue (based on min distance) is needed on the side in order to determine which node to be picked. Note that priority_queue is based on “largest node in the front”, we need to tweak the comparison function.

This is just like a BFS queue, except that when we take a node out from this queue, it always contains the smallest weight/distance.

Typical/Lazy

· Step 4.a use a priority_queue; loop until this loop size == 0

· Push calculated new values into the queue

· A node is marked “visited” AFTER this is selected as the “from” node & all of its “to” nodes are relaxed

· When “taking a min node”, compare this with the current MIN_DISTANCE_ARRAY. Drop if smaller, since a node can be visited from many paths.

Exit early with destination node

The worst condition is to visit ALL nodes before claiming the MIN_DISTANCE_ARRAY is final. However, in Step 5, if the current “from” node is the destination node, we can exit.

qiqi_hua

0操作17 ^#

Bellman Ford

Not as ideal as Dijkstra; however, if there is a negative edge, Bellman Ford is the choice to find it.

Basic steps is roughly the same; however, this algorithm shall repeat twice (each iteration for all nodes loop) in order to identify nodes involved in a negative weight cycle.

Floyd Warshall

This is a classical DP-based solution, which gives the entire shortest path in two loops.

Similar to Bellman Ford, this runs the logic twice, in order to identify negative cycles.

The input is connection matrix (N*N), rather than the usually connection edges.

qiqi_hua

0操作18 ^#

qiqi_hua

0操作19 ^#

notes都放这里，chats版的leetcode截图，还是放那边。华人网的dedup 功能不知道好不好，我就不重复发截图了。

谢谢大家，希望对大家有帮助。leetcode我买了一年，不过，应该不会继续碰这个了。我的下一段路程开始。

robots

0操作20 ^#

LZ谢谢你，收藏了！