Home
Problem
Status
Contest
Workbook
User
Group
Forum
Register
Login
{"managingGroups":{},"author":"wyyweb","updateTime":1575160901000,"title":"Codeforces科学刷题指南","dislikeCnt":0,"content":"简要介绍如何科学地刷算法题,来提高自己解决问题的能力,并利用爬虫抓取Codeforces的题库,来分析题目难度以及算法分类的关系\n\n无论做什么事,多尝试、找套路、然后刻意练习都是至关重要的。对信息科学竞赛(Olympiad in Informatics)爱好者来说,找套路的关键就是多刷题。然而题海茫茫,单以Codeforces来说,截止2017年1月3日,总共有3206道题。换言之,如果一个人足够勤奋,能够一天刷三道题,那也得快三年才能把题目刷完,而且题目数量还在扩充。所以盲目的刷题简直是浪费生命,本人从16年上半年一直按照题目解决人数从高到低排序,不断的刷水题。显然易见,刷水题的后果就是没有长进,熟悉的还是熟悉,不懂的还是不懂,唯一让自己开心的就是刷题数量的累积。所以科学刷题的本质在于不断挑战新高度,在一个平台练习足够久足够熟练之后,就要进入下一个难度平台。为了方便大家,我把Codeforces上截止2017年1月3日的所有题目的基本信息用爬虫收集了下来,并存储到excel里。更进一步,本文试图分析不同算法在不同难度等级上的出现频率分布,以及不同算法在不同难度等级上被解决次数的分布。最后,我会简要介绍的我的刷题观,以及如何爬取Codeforces上的信息。\n\n\n先说结论\n一张图\nCodeforces_Algorithms_Tag_Frequency.jpgCodeforces_Algorithms_Tag_Frequency.jpg\n上面这张图反映了不同算法(第一列)在不同问题难度(第一行)上的频率分布,基于该图,大概就可以知道在什么样的水平下应该掌握什么样的算法。不过这里我没有区分Div1和Div2之间的差别,仅仅是按照题号(A、B、C等等)来推断难度。可以看到对简单的A题而言,大部分都是考察基本的编程功底,诸如implementation(大概就是题目说什么,你做什么就是了),math(四则运算、取模取整等等)以及brute force(暴力枚举)。而随着难度的增加,比如说E题,主要就在于考察对dp(动态规划),data structures(数据结构)。当然了,从图中也可以看出,高难度题目主要在math,geometry(计算几何),shortest \npath(图论)以及games(博弈)上。下面再免费附送领一张图,反映了不同算法在不同问题难度上被解决次数的频率分布。\n\nCodeforces_Algorithms_Tag_Solved.jpgCodeforces_Algorithms_Tag_Solved.jpg\n\n一张表\nCodeforces-ProblemSet.jpgCodeforces-ProblemSet.jpg\n然后祭上刷题目录,也就是这一张表,汇总了截止2017年1月3日Codeforces题目上的所有算法题。基于这张表,一来可以按照解决人数来进行刷题,二来可以按照题目难度进行刷题,三来还可以进行主题刷题。具体的文件下载链接请见文末。\n\n\n我的刷题观\n这里搬来我在知乎上的回答,详见 LeetCode按照怎样的顺序来刷题比较好?\n\n如果想提升自己的思维能力 ,可以按照AC率或者解决人数由低到高二分查找匹配自己当前水平难度的题目,然后适当挑战高难度题(二分时间复杂度是 ,至少比从易到难的 节省时间)\n如果想巩固某一专题 ,那自然应该按照tag来刷题,但是因为所用的方法在求解前已知,不太利于思维能力的提升\n如果什么都不懂 ,那么建议随机刷题,一来可以涨见识,二来进步空间比较大\n如果想提高AC率或者增加自信 ,那么建议刷水题\n混搭以上策略 ,比如针对某一专题,然后用二分查找来选择问题求解\n再有个建议,题目如果太难超过自己当前能力的话,尝试一定时间后还是老老实实看题解吧,人与人之间还是有天赋差别的,但区别在于经验可以慢慢积累。特别是即使做对题之后,还要想尽办法看有没有提高的余地,并参考别人的代码,看如何精简代码以及精简时间空间复杂度。\n\ntourist.jpgtourist.jpg\n据说大神们的刷题量都是上万的,所以正式比赛里可以看到诸多大神不到一分钟就秒了一道题,手速太快。对Competitive Programming而言,把题目做对是基本要求(题目太难则另当别论),用更快的速度求解才是顶尖高手之间的核心区别。如果说真的有天赋存在的话,那我们也无能为力;但希望能像卖油翁一样说出,『无他,但手熟尔』。\n\n\n\n如何用爬虫获取信息\n必要的库\n1: import re\n2: import urllib.request\n3: from bs4 import BeautifulSoup\n4: import os\n5: import csv\n6: import time\n爬取Codeforces的所有算法题\n 1: #%% retrieve the problem set\n 2: def spider(url):\n 3: response \u003d urllib.request.urlopen(url)\n 4: soup \u003d BeautifulSoup(response.read())\n 5: pattern \u003d {\u0027name\u0027: \u0027tr\u0027}\n 6: content \u003d soup.findAll(**pattern)\n 7: for row in content:\n 8: item \u003d row.findAll(\u0027td\u0027)\n 9: try:\n10: # get the problem id\n11: id \u003d item[0].find(\u0027a\u0027).string.strip()\n12: col2 \u003d item[1].findAll(\u0027a\u0027)\n13: # get the problem title\n14: title \u003d col2[0].string.strip()\n15: # get the problem tags\n16: tags \u003d [foo.string.strip() for foo in col2[1:]]\n17: # get the number of AC submissions\n18: solved \u003d re.findall(\u0027x(\\d+)\u0027, str(item[3].find(\u0027a\u0027)))[0]\n19: # update the problem info\n20: codeforces[id] \u003d {\u0027title\u0027:title, \u0027tags\u0027:tags, \u0027solved\u0027:solved, \u0027accepted\u0027:0,}\n21: except:\n22: continue\n23: return soup\n24: \n25: codeforces \u003d {}\n26: wait \u003d 15 # wait time to avoid the blocking of spider\n27: last_page \u003d 33 # the total page number of problem set page\n28: url \u003d [\u0027http://codeforces.com/problemset/page/%d\u0027 % page for page in range(1,last_page+1)]\n29: for foo in url:\n30: print(\u0027Processing URL %s\u0027 % foo)\n31: spider(foo)\n32: print(\u0027Wait %f seconds\u0027 % wait)\n33: time.sleep(wait)\n标记已解决的算法题\n 1: #%% mark the accepted problems\n 2: def accepted(url):\n 3: response \u003d urllib.request.urlopen(url)\n 4: soup \u003d BeautifulSoup(response.read())\n 5: pattern \u003d {\u0027name\u0027:\u0027table\u0027, \u0027class\u0027:\u0027status-frame-datatable\u0027}\n 6: table \u003d soup.findAll(**pattern)[0]\n 7: pattern \u003d {\u0027name\u0027: \u0027tr\u0027}\n 8: content \u003d table.findAll(**pattern)\n 9: for row in content:\n10: try:\n11: item \u003d row.findAll(\u0027td\u0027)\n12: # check whether this problem is solved\n13: if \u0027Accepted\u0027 in str(row):\n14: id \u003d item[3].find(\u0027a\u0027).string.split(\u0027-\u0027)[0].strip()\n15: codeforces[id][\u0027accepted\u0027] \u003d 1\n16: except:\n17: continue\n18: return soup\n19: \n20: wait \u003d 15 # wait time to avoid the blocking of spider\n21: last_page \u003d 10 # the total page number of user submission\n22: handle \u003d \u0027Greenwicher\u0027 # please input your handle\n23: url \u003d [\u0027http://codeforces.com/submissions/%s/page/%d\u0027 % (handle, page) for page in range(1, last_page+1)]\n24: for foo in url:\n25: print(\u0027Processing URL %s\u0027 % foo)\n26: accepted(foo)\n27: print(\u0027Wait %f seconds\u0027 % wait)\n28: time.sleep(wait)\n输出爬取信息到csv文本\n 1: #%% output the problem set to csv files\n 2: root \u003d os.getcwd()\n 3: with open(os.path.join(root,\"CodeForces-ProblemSet.csv\"),\"w\", encoding\u003d\"utf-8\") as f_out:\n 4: f_csv \u003d csv.writer(f_out)\n 5: f_csv.writerow([\u0027ID\u0027, \u0027Title\u0027, \u0027Tags\u0027, \u0027Solved\u0027, \u0027Accepted\u0027])\n 6: for id in codeforces:\n 7: title \u003d codeforces[id][\u0027title\u0027]\n 8: tags \u003d \u0027, \u0027.join(codeforces[id][\u0027tags\u0027])\n 9: solved \u003d codeforces[id][\u0027solved\u0027]\n10: accepted \u003d codeforces[id][\u0027accepted\u0027]\n11: f_csv.writerow([id, title, tags, solved, accepted])\n12: f_out.close()\n分析题目难度以及算法分类的关系\n 1: #%% analyze the problem set\n 2: # initialize the difficult and tag list\n 3: difficult_level \u003d {}\n 4: tags_level \u003d {}\n 5: for id in codeforces:\n 6: difficult \u003d re.findall(\u0027([A-Z])\u0027, id)[0]\n 7: tags \u003d codeforces[id][\u0027tags\u0027]\n 8: difficult_level[difficult] \u003d difficult_level.get(difficult, 0) + 1\n 9: for tag in tags:\n10: tags_level[tag] \u003d tags_level.get(tag, 0) + 1\n11: import operator\n12: tag_level \u003d sorted(tags_level.items(), key\u003doperator.itemgetter(1))[::-1]\n13: tag_list \u003d [foo[0] for foo in tag_level]\n14: difficult_level \u003d sorted(difficult_level.items(), key\u003doperator.itemgetter(0))\n15: difficult_list \u003d [foo[0] for foo in difficult_level]\n16: \n17: # initialize the 2D relationships matrix\n18: # matrix_solved: the number of AC submission for each tag in each difficult level\n19: # matrix_freq: the number of tag frequency for each diffiicult level\n20: matrix_solved, matrix_freq \u003d [[[0] * len(difficult_list) for _ in range(len(tag_list))] for _ in range(2)]\n21: \n22: \n23: # construct the 2D relationships matrix\n24: for id in codeforces:\n25: difficult \u003d re.findall(\u0027([A-Z])\u0027, id)[0]\n26: difficult_id \u003d difficult_list.index(difficult)\n27: tags \u003d codeforces[id][\u0027tags\u0027]\n28: solved \u003d codeforces[id][\u0027solved\u0027]\n29: for tag in tags:\n30: tag_id \u003d tag_list.index(tag)\n31: matrix_solved[tag_id][difficult_id] +\u003d int(solved)\n32: matrix_freq[tag_id][difficult_id] +\u003d 1\n\n下载本文源代码以及分析结果\n本文源代码以及分析结果请见 我的Github ,或者点击链接下载: https://pan.baidu.com/s/1o7P8oT8 密码: 8dcb。\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \u003c/div\u003e\n \u003c/div\u003e\n \u003c/article\u003e\n1\n2\n3\n","threadId":60218,"likeCnt":7,"createTime":1575160901000,"isWorkbook":false,"viewCnt":3722,"openness":2,"fav":false,"id":2111,"trustable":false}