Register
Login
Resources
Docs Blog Datasets Glossary
Pricing Product 最佳电子竞技即时竞猜平台。
Connect to our Discord channel
General: mining software repositories Integration: dvc github git
Jakub Narębski 0aefa841be
README.md: Add "CVE data extraction" section
2 months ago
1875558969
Configured DagsHub as the DVC "origin" remote
6 months ago
138ff80fd6
cve_surv_analysis.py: Finish adding function docstrings
3 months ago
b4e91360dc
add_cvss_ranking.py: Add and use make_columns_categorical()
5 months ago
dcf2abb5ea
tiobe
5 months ago
ee534d6fe2
build(deps): bump certifi in /dependency_search
2 months ago
509ef797ad
Update dvc.lock
4 months ago
52dfc5d1ba
Add an example of pom.xml
6 months ago
6014800a3b
survey_analysis_*_metadata.ipynb: Update/Add selection of risk factors
4 months ago
7aa35fc2b2
Add projects_stats/merge_with_project_metadata.py
5 months ago
629e59f1d5
Initialize DVC
6 months ago
d8e196d5e3
.gitignore: Add patterns for Jupyter Notebooks
5 months ago
f7c96fc0ad
Add .mailmap
3 months ago
a1722deff7
Use MIT license for the code
2 months ago
0aefa841be
README.md: Add "CVE data extraction" section
2 months ago
9a5164c119
Update dvc.lock
4 months ago
89dc71584d
Update dvc.yaml
5 months ago
7678822201
Add analysis/run_cve_surv_analysis.py script (work in progress)
4 months ago
数据脉冲eline
Legend
"的管理文件
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment.Sign In

The Secret Life of CVEs - code and notebooks

This repository contains scripts to process and join data from the World of Code dataset (seehttps://arxiv.org/abs/2010.16196) and CVE (Common Vulnerabilities and Exposures) dataset (gathered using thecve-searchproject), that were used in the"The Secret Life of CVEs"paper submission, accepted to MSR 2023 Challenge:https://conf.researchr.org/track/msr-2023/msr-2023-mining-challenge.

Results were analyzed with the help of Jupyter Notebooks, available in the 'notebooks/' subdirectory.

The final dataset, along with the source code and notebooks used to extract and analyze the data, are accessible on Figshare:https://doi.org/10.6084/m9.figshare.22007003.

Running the code

The code requires Python 3 to run.

Results of each script are saved in thedata/directory. Files in this directory without any extension are pandas dataframes saved in parquet file format.

The easiest way to run all scripts in order is to use DVC (Data Version Control) command line tool. To recreate data processing and filtering on your local machine, use "dvc repro" in main directory, which will run all scripts according to what is in the "dvc.yaml" file, replacing data folder content when needed.

The data is also available on DagsHub, in the connected repository://www.kkolawyers.com/ncusi/secret_life_of_CVEs, from which you can get data from with "dvc pull" (after configuring DagsHub as dvc remote).

Replicating paper results

To replicate the results in the paper, after recreating data files, or downloading them from Figshare, or from DagsHub, use Jupyter notebooks from the "notebooks/" directory.

World of Code data extraction

The code starts with data already extracted from World of Code. To recreate data extraction from WoC servers:

  • Run "projects_stats/with_CVS_in_commit_message_ignore_case.sh" on WoC servers
  • Run "cat search.CVE_in_commit_message_ignore_case.lstCmt_9.out | cut -d';' -f1 | ~/lookup/getValues c2P 1 >projects_with_CVE_fix.txt" on WoC servers
  • Run "cve_search_parser.py search.CVE_in_commit_message.lstCmt_9.out projects_with_CVE_fix.txt cve_df_filename" on WoC servers
  • Copy the result 'cve_df_filename' to local machine, and replace 'cve_df_filename' in 'data/' folder.

CVE data extraction

Retrieving CVE information (with the help of 'cve_information/retrieve_cve_info.py' script) requires an instance ofCVE-Searchrunning, as the script makes use of its REST API. Currently the instance URI is hardcoded, and you need to change it to be able to use your local instance, or some public instance. You would need to change the following line ingather_cve_published_data()function:

url = 'http://158.75.112.151:5000/api/cve/' request_url = url + cve

The data file is avaailable on Figshare, and via DagsHub.

Tip!

Pressporto see the previous file or,norto see the next file

About

Scripts, code, and data for "The Secret Life of CVEs", accepted to MSR 2023 Challenge: https://conf.researchr.org/track/msr-2023/msr-2023-mining-challenge

Collaborators2

Comments

Loading...