数据结构——基础问题

六、哈夫曼树

1.合并果子

//
// Created by Kahiva on 2023/10/31.
//
//https://www.acwing.com/problem/content/150/
//使用哈夫曼树的知识合并果子堆，计算最小花费体力

#include <iostream>
#include <algorithm>
#include <queue>
#include <vector>

using namespace std;

int main(){
    //定义一个优先队列，第一个参数是队列元素的类型，第二个是使用的容器类型，第三个是排序方式。
    //此处使用升序排序，即小的元素优先出队。
    priority_queue<int, vector<int>, greater<int>> stones_heap;

    int n;
    cin>>n;
    while (n--){
        int stones;
        cin>>stones;
        stones_heap.push(stones);
    }

    int strength_used = 0;

    while (stones_heap.size() > 1){
        //取出最小的两个结点（果子堆）
        int min1 = stones_heap.top();
        stones_heap.pop();
        int min2 = stones_heap.top();
        stones_heap.pop();
        //将这两个果子堆合并
        stones_heap.push(min1 + min2);
        //累计合并果子堆花费的体力
        strength_used += (min1 + min2);
    }

    cout<<strength_used;

}

2.荷马史诗

K叉哈夫曼树的问题类似于二叉哈夫曼树。需要注意的点是，在K叉哈夫曼树中，我们每次合并结点都会取出最小的K个结点，在我们进行最后一次合并的时候，可能存在的结点个数不足K个。

如1，2，3，4，5，6 ——> 6（1，2，3），4，5，6 ——> 6（1，2，3），15（4，5，6）.

这时候我们需要补充零结点进去充数。 6（1，2，3），15（4，5，6），0 ——> 21（6，15）.然后我们会发现，将二层的0和底部的6交换，会让K叉树的结果变好。所以，我们应在开始合并前就补充零结点，使得刚好其元素个数刚好能构建K叉哈夫曼树。

这时候我们可以分析什么时候该补充零结点。我们每次合并结点都会将K个结点合并成一个结点，减少了K-1个结点。刚开始N个元素，全部合并后仅剩1个元素，减少了N-1个结点。每次合并减少K-1个，共需减少N-1个，合并次数为（N-1）/ (K-1)。即不能整除的时候，说明需要补充零结点。

由于荷马史诗的编码要求编码树的高度最小，所以在合并结点的时候，除了选择最小的K个结点，还要在节点值相同的时候，优先选择结点展开的子树高度（深度）最小的结点。

比如，1，3，4，3 ——> 4（1，3），4，3 。根据值最小原则，可以选第一个4，也可以选第二个4，与3合并。很显然，前者合并到最后的高度为3，后者为2。所以在合并的时候，我们应尽量往矮的结点上合并。就像俄罗斯方块，合并的恰到好处才不容易到达顶端。我想大多数人都是优先平铺方块再垂直放的吧。

//
// Created by Kahiva on 2023/11/1.
//
//https://www.acwing.com/problem/content/151/
//荷马史诗

#include "iostream"
#include "algorithm"
#include "queue"
#include "vector"

using namespace std;

//define a pair of data. long long for word_code and int for Huffman coding node depth.
typedef pair<long long, int> PLI;

int main(){
    priority_queue<PLI, vector<PLI>, greater<>> words;

    //k is the system of numeration
    long long words_num, k;
    cin>>words_num>>k;

    while (words_num--){
        long long word;
        cin>>word;
        words.push({word, 0});
    }
    //every loop for merge will delete k nodes and add a combined node,
    //so k-1 nodes decrease in each merge.
    //the loop will continues until there is only one node.
    //loop time = (words_num - 1) / (k-1).
    //if the time is not a integer, just add zero node until it can be a integer.
    while ((words.size() - 1) % (k - 1) != 0)
        words.push({0,0});

    //the with power length is the new coded book length
    long long wpl = 0;

    while (words.size() > 1){
        long long  combined_node_val = 0;
        int max_depth = 0;
        for (int i = 0; i < k; ++i) {
            //get the min node
            auto min_node = words.top();
            //merge
            combined_node_val += min_node.first;
            //save the highest son-tree's depth
            max_depth = max(max_depth, min_node.second);
            //delete the node
            words.pop();
        }
        //add the combined node.
        words.push({combined_node_val,max_depth + 1});
        //the new combined node value is a part of the WPL.
        wpl += combined_node_val;
    }

    cout<<wpl<<endl<<words.top().second;

    return 0;
}

七、拓扑排序（有向图问题）

1.基于广搜（广度优先遍历）的拓扑排序实现方法

广度优先遍历（BFS），实际上就是一层一层地遍历。

将第一个顶点入队
队列出队，将出队的顶点连接的顶点进行入队，我简称为出队展开。
重复步骤2直到队列为空

注意，需要开一个数组保存顶点是否被遍历过。顶点入队的时候，就该标记为被遍历过了，防止重复入队（比如出现回环的时候，A<—>B，A出队会B会入队，B出队A会入队，会出现死循环）。

拓扑排序实现方式

 1. 入队的标准改为：入度为0的顶点才能入队。
 2. 当顶点入队的时候，将该顶点可以通往顶点的入度减1（即在图中隐藏该入队的顶点，代表该工程任务结束）
 3. 简单来说，就是在BFS出队展开时，只让入度为0的顶点入队（即只有前置任务全部完成，才能将新的任务加入执行清单）。

需要开一个数组维护每个顶点的入度。
因为只有入度为0的才能入队，重边，或者回环，都会导致度不为0无法入队，所以不需要BFS那个判断是否被遍历过的数组。

//
// Created by Kahiva on 2023/11/5.
//
//https://www.acwing.com/problem/content/850/
//拓扑排序(采用广度遍历实现)

#include "iostream"
#include "algorithm"
#include "queue"

using namespace std;

//n个顶点m条边
int n, m;
//定义一个常量N，即最大可能用到的顶点数。定义入度数组和模拟队列的数组时可以用到。
const int N = 100010;

struct Node {
    int nodeId;
    Node* next;
    Node(int id):nodeId(id), next(NULL){};
}*link[N];

//保存各顶点的入度
int inDegree[N];
//采用数组模拟队列，出队移动头索引，入队移动尾索引。这样可以在模拟队列的同时，保留原队列中的元素。
int checkQueue[N];

//将某个点和另一个点的边加入邻接表表示的图中。
void addLink(int fromNodeId, int toNodeId){
    Node* node = new Node(toNodeId);
    node->next = link[fromNodeId];
    link[fromNodeId] = node;
}

bool topSort(){
    //head指向队头，rear指向队列最后一个元素
    int head = 0, rear = -1;
    //因为顶点编号从1开始，所以入度数组从索引1开始遍历
    for (int i = 1; i <= n; ++i) {
        //遍历顶点，如果顶点入度为0，就入队
        if (!inDegree[i]){
            checkQueue[++rear] = i;
        }
    }
    while (head <= rear){
        //取出队头元素
        auto top = checkQueue[head++];
        //遍历它的邻边，将邻边那头对应点的入度减一
        for (auto p = link[top]; p ; p = p->next){
            inDegree[p->nodeId]--;
            //如果没有了前置结点，代表可以入队(即被遍历，或者说该工程任务可以执行了)
            if (inDegree[p->nodeId] == 0){
                checkQueue[++rear] = p->nodeId;
            }
        }
    }

    //如果队列中的元素个数刚好等于顶点个数，就代表拓扑排序成立，即不存在闭环。
    //因为如果有闭环，闭环上结点的入度始终不为0，无法入队。
    return n == rear + 1;
}

int main() {

    cin>>n>>m;

    while (m--){
        int from, to;
        cin>>from>>to;
        addLink(from, to);
        inDegree[to]++;
    }

    //如果拓扑排序无效，则输出-1
    if (!topSort())
        cout<<"-1";
    else
        for (int i = 0; i < n; ++i)
            cout<<checkQueue[i]<<' ';

    return 0;

}