一、干什么
某天 工作 摸鱼时,L瘾犯了,速速打开熟悉的L站,
很多帖子在首页,但没浏览过。
本以为是错过的新活准备愉悦观帖,但居然全是上古帖子,
有这样的:,
居然还有这样的:
最后回复时间也是几个月前!!!!
谁家好人挖坟啊啊啊啊!!!!
那么,蟹老板也想跟随各位佬的脚步一起玩儿!
二、能干吗
挖坟是否合规?翻了翻,发现始皇曾曰:
奉天承运,始皇诏曰
干是能干,有点难找哪里能干
翻找哪里能挖实在太累了,弄个小工具再干!!!!
三、怎么干
①先找找哪里能看到帖子,发现一个接口长这样:
https://linux.do/latest.json?no_definitions=true&page=${pageNumber}
请求一下有个Json……嗷~~
②响应回来Json有部分是这样的:
ResponseData -> topic_list -> topics
{
"id": 69665,
"title": "linux do是不是被墙了",
"fancy_title": "linux do是不是被墙了",
"slug": "topic",
"posts_count": 31,
"reply_count": 10,
"highest_post_number": 32,
"image_url": null,
"created_at": "2024-04-28T15:27:30.881Z",
"last_posted_at": "2024-04-29T13:43:41.492Z",
"bumped": true,
"bumped_at": "2024-04-29T13:43:41.492Z",
"archetype": "regular",
"unseen": false,
"last_read_post_number": 17,
"unread": 0,
"new_posts": 15,
"unread_posts": 15,
"pinned": false,
"unpinned": null,
"visible": true,
"closed": false,
"archived": false,
"notification_level": 2,
"bookmarked": false,
"liked": false,
"tags": [],
"tags_descriptions": {
},
"views": 891,
"like_count": 21,
"has_summary": false,
"last_poster_username": "yeahow",
"category_id": 2,
"pinned_globally": false,
"featured_link": null,
"has_accepted_answer": false,
"can_have_answer": false,
"can_vote": false,
"posters": [
{
"extras": null,
"description": "原始发帖人",
"user_id": 19906,
"primary_group_id": null,
"flair_group_id": 13
},
{
"extras": null,
"description": "频繁发帖人",
"user_id": 1,
"primary_group_id": null,
"flair_group_id": 1
},
{
"extras": null,
"description": "频繁发帖人",
"user_id": 1414,
"primary_group_id": null,
"flair_group_id": null
},
{
"extras": null,
"description": "频繁发帖人",
"user_id": 3418,
"primary_group_id": null,
"flair_group_id": 13
},
{
"extras": "latest",
"description": "最新发帖人",
"user_id": 21176,
"primary_group_id": null,
"flair_group_id": 13
}
}
那么这个bumped_at…… 嗷~~
③那筛选时间在几个月前的topic不就……嗷~~
四、想好就干,大干特干
但是老蟹只会写HelloWorld级c艹代码,硬着头皮干了:
①用什么库:
先发送请求,curl
#include <curl/curl.h>
然后解析响应,用libjsoncpp
#include "jsoncpp/json/json.h"
记录一下结果吧,直接建个文件夹扔csv里得了
②干:
大概如下:
当请求到一定页数之后没数据啦,那工作就完成了,所以弄个变量存一下:
bool FinalStatus = false;
同时判断一下时间,老蟹觉得三个月就算挖坟咧:
( 这是尊贵的gpt大人写的 )
bool timeDiff(const std::string &isoTimeStr) {
std::tm tm = {};
std::istringstream ss(isoTimeStr);
ss >> std::get_time(&tm, "%Y-%m-%dT%H:%M:%S.%fZ");
auto givenTime = std::chrono::system_clock::from_time_t(std::mktime(&tm));
auto now = std::chrono::system_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::hours>(now - givenTime).count();
return (duration / 24.) >= (3 * 30.44); // 平均每月天数约30.44天
}
写入csv,先新建一个csv:
{
mode_t mode = 0755;
time_t now = time(0);
tm *tc = localtime(&now);
std::string currtime = "-" + std::to_string(tc->tm_year + 1900) + std::to_string(tc->tm_mon + 1) +
std::to_string(tc->tm_mday) + '-' + std::to_string(tc->tm_hour) + '-' +
std::to_string(tc->tm_min);
std::string dir = "../LinuxDoTopicLists";
int mdret = ::mkdir(dir.c_str(), mode);
std::string filename = "LinuxDo-Pages-Topics" + currtime;
filename = dir + "/" + filename + ".csv";
fs.open(filename, std::ios::in | std::ios::out | std::ios::trunc);
if (fs.is_open()) {
fs << "url,topic\n";
} else {
FinalStatus = true;
}
}
然后写入数据,主要记录以url形式记录id与标题名儿:
void topicWrite(int topic_id, const std::string &topic) {
if (fs.is_open()) {
fs << "https://linux.do/t/topic/" << topic_id << "," << topic << std::endl;
} else {
std::cout << "file open error" << std::endl;
}
}
万事具备,请求一下:
static size_t WriteCallback(void *contents, size_t size, size_t nmemb, std::string *userp) {
size_t totalSize = size * nmemb;
userp->append((char *)contents, totalSize);
return totalSize;
}
bool getLinuxdoPage(const std::string &url) {
std::string readBuffer;
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
}
CURLcode CurlNode = curl_easy_perform(curl);
if (CurlNode != CURLE_OK) return false;
return parseJson(readBuffer);
}
这儿筛选一下然后存起来:
bool parseJson(const std::string &readBuffer) {
Json::Value data;
Json::Reader reader;
if (!reader.parse(readBuffer, data)) return false;
if (data["topic_list"]["topics"].empty()) {
FinalStatus = true;
return false;
}
// data["topic_list"]["topics"][99]["bumped_at"]
for (Json::Value &temp : data["topic_list"]["topics"]) {
if (timeDiff(temp["bumped_at"].asString())) {
topicWrite(temp["id"].asInt(), temp["fancy_title"].asString());
}
}
return true;
}
OK,再整理整理就可以了,嗷~~
③附一份结果,佬友们有空可以去挖~~~
没有添加token好像只能返回前1000页,玩一玩也够了!!
附一份结果,也就1 2 3 4 5 ……一共9270条帖子:
LinuxDo-Pages-Topics-202481-16-35.csv (690.2 KB)
五、佬们怎么干
有佬指点一下萌新的c艹代码吗?怎么写更好一点?
或者,有没有更妙的的什么办法?
还有,KFCVME50,谁请我吃??