Artifact Evaluation Reviewer Guide

This evaluator guide was adapted from the USENIX Security 2023 AE guide; however, since we use different Badges, there are salient differences that this guide also accounts for.

If you have general questions, please contact the artifact evaluation chairs . If you have a question about a specific artifact, see below for instructions on asking the authors.

Your Goal as an Artifact Evaluation Reviewer

The goal of artifact evaluation is to help science by ensuring that published papers are accompanied by high-quality artifacts (e.g., software, hardware, datasets, etc.) that can be reused and extended by others.

Your main goal is to read the paper and judge how well the artifact matches the expectations the paper sets. We expect artifacts to be (i) consistent with the paper, (ii) as complete as possible, (iii) documented well, and (iv) easy to reuse for further research.

Keep in mind that artifact evaluation is a cooperative process. Even if artifacts initially do not meet the requirements for badges, it should be your goal to enable the authors to revise the artifact so that it qualifies for badges. Hence, feedback should be actionable and interactive. Artifacts should only "miss" badges if there was not enough time to reasonably address the evaluators’ concerns, or if the authors were unresponsive or unreasonable. Note that authors will be able to update their artifacts with no restrictions in response to your comments (e.g., to fix bugs). Authors can submit artifact/documentation updates through external repositories rather than HotCRP.

We also ask you to actively engage not only with the authors, but your fellow reviewers. Peer review heavily relies on you being active in facilitating the exchange of perspectives with other reviewers and the authors.

The papers under evaluation have already been accepted by the technical program committee, or are under minor revision, so you do not need to evaluate their scientific soundness. However, if you believe you have found a technical flaw in a paper anyway, contact the artifact evaluation chairs.

Confidentiality: Keep in mind that all artifacts, reviews, and discussions are confidential. Some artifacts may even contain embargoed material (e.g., exploits) due to vulnerability disclosure. As such, you should access papers and reviews only for the purpose of discussing the submissions within the ACSAC peer review process. You should treat all submissions and reviews as confidential, and not share them with external parties. Hence, we trust you will treat all the AE material as confidential and also delete the artifact after the evaluation has finished (although most authors will publish the artifact upon publication of the paper).

Please also refer to the IEEE CompSoc Reviewer Guidelines and contact the chairs in case of further questions.

Timeline

Artifacts submission deadline: August 28 - 11:59pm EDT
Reviewer Bidding Deadline: August 31 - 11:59 EDT
Kick the tires Deadline: September 11 - 11:59 EDT
Mid-Eval feedback Deadline: September 25 - 11:59 EDT
Final Review Deadline: October 6 - 11:59 EDT
Artifacts evaluation period: September 2 - October 10
Final Reviewer Discussion Period: October 7 – October 9
Artifacts evaluation decision: October 10
Final papers with AE badge labels due: refer to camera-ready deadline

The bidding deadline is the first important deadline, as it will allow the chairs to distribute artifacts in a way that maximizes evaluator expertise and interest. Bidding maximizes your chances to evaluate artifacts in domains you know about and are interested in.

After you got your artifacts assigned, there is one week to ‘kick the tires’. Find out whether you can actually obtain and run the artifact as provided by the authors, check whether something is missing in the documentation etc. To formalize this process, reviewers are asked to provide the authors with a brief summary of their assessment by that time.

In the subsequent reviewing and author discussion period, evaluators and authors actively interact to evaluate and improve the artifacts. Reviewers should communicate with the authors to request clarifications/improvements to the artifacts and the documentation. Reviewers' reports are due in the middle of the period. This will facilitate engagement, and ensure that the authors get a perspective on where their artifact stands. Following two more weeks of engagement and improvement, final reviews are due before badge decisions are made in early October.

Afterward, there is some time to agree on badges before the final deadline. This is to ensure that there is time for reviewers to discuss the artifacts that need it. Keep in mind that the final deadline for agreeing on badges is strict.

Communicating with Authors

Artifact evaluation is single-blind, meaning authors do not and must not know who you are so that you can be honest and unbiased in your assessment. To enable this, all communication between authors and reviewers must be done through HotCRP, not by other means such as email.

Please make sure that in your HotCRP profile, under "Preferences", the "Send mail" box for "Reviews and comments on authored or reviewed submissions" is checked, so that you are notified of comments on your assigned artifacts from authors and fellow reviewers.

To add a comment on HotCRP, at the bottom of the artifact page, click on the "Add comment" button to show a form, type your comment, and select the right visibility for your comment. Discussion with authors must be "Author discussion", while discussion with evaluators must be "Reviewer discussion". For chairs-only comments, you can use "Administrators only". Leave the second option to "about: submission".

You can notify a fellow evaluator with an @-mention in a HotCRP comment, as on many other platforms. Type @ and let HotCRP autocomplete the name you want. You can also use the same @-mention mechanism to tag the AEC chairs (@Adwait Nadkarni @Tobias Fiebig) and bring an issue to their attention.

Use "Reviewer discussion" comments to synchronize with your fellow evaluators and ensure the same issue was not raised by another review before.

Authors submit their initial version of the artifact, artifact documentation, and any other information needed to evaluate the artifact. You should carefully read these documents and make recommendations to the authors to improve the documentation or the artifact. Any errors you find or missing information should be documented and communicated as early as possible to the authors. They will then update the artifact or documentation. The badge decision is made based on the last submitted version of the artifact and appendix and should be independent of how many problems you ran into or changes that were needed on the path there.

Evaluation setup

Authors are generally free when it comes to how they provide their artifact. However, to create a common ground, the artifacts should generally be runnable on an AMD64 Ubuntu 22.04, unless specific requirements (GPU support, different CPU architecture, special hardware) require it.

To enable authors to prepare for that, and aid reviewers in evaluating artifacts, we provide access to virtual machines via HotCRP. VMs can be started via HotCRP. Reviewers and authors can also assign a VM to a specific paper, and give authors and/or reviewers access to the virtual console of the VM. Furthermore, users can login via SSH. However, when sharing a VM with reviewers or authors, please be careful to not reveal your own IP address, e.g., by using Tor or a jump host not related to your institution. If you do not have access to such a jump host, please contact the chairs, who will give you access to one.

The following types of virtual machines are available via HotCRP:

Base VM: 4 cores, 16GB of memory, 50GB disk space
Docker VM: Same as base VMs, but comes with pre-installed docker
Compute VM: 16 cores, 64GB of memory, 100GB disk space
Scan VM: Same as base VMs, but located in a dedicated network allowing active network measurements; Also pre-configured with a webserver providing information on potential active measurements. Only use this to replicate active measurements for which the authors received clearance from their IRB or similar ethics review board.

If additional hardware (GPUs, other architecture, more resources) or services (interconnecting networks via VPN bridges etc.) is required, please contact the AE chairs. Please also do so if reviewers are required to access the authors’ systems directly. The chairs will then facilitate a jump-host for reviewers to connect to the authors’ systems anonymously.

Note on anonymization: Even though less secure, we recommend using the autogenerated passwords to access the artifact VMs and share access with authors/reviewers, as SSH keys may leak reviewers’ identity. Furthermore, the aforementioned considerations on IP addresses hold.

Bidding phase

Once artifacts are submitted, you need to bid for the artifacts you want to review. You can enter your preferences by the bidding deadline by logging into HotCRP and clicking on "Review preferences". You can use -20 to 20 as the range to rank the artifacts by preference and -100 to declare a conflict of interest (contact the AE chairs if unsure). When bidding, also pay attention to the hardware/software requirements of the artifact. Bid positively for at least 7 artifacts.

Note: We will try to match artifacts to your preferences, but if you don’t bid for enough (7) artifacts by the deadline, you may be assigned less-than-ideal artifact(s) for your profile.

Reviewing artifacts

The initial "kick the tires" period

Once you have been assigned artifacts comes the initial "kick the tires" period. The goal of this period is to quickly (i.e.,one week after the assignment ) determine whether you have everything you need for a full review: the artifact itself, any necessary hardware or other dependencies, and a plan on how you will evaluate the artifact.

Read the artifact documentation carefully. In particular, check the software and hardware dependencies to make sure you have all you need. You are allowed to use your own judgment when making decisions, for instance, to evaluate reasons why some artifacts may not be able to reproduce everything their paper contains.

Particularly, make sure to do the following in this initial phase:

Check whether you have everything you need to do the evaluation , and if not, what is missing including:
Access to the necessary hardware
For artifacts requesting the "available" badge (all artifacts), documentation and full source code as mentioned in the available badge checklist.
For artifacts requesting the "reviewed" badge ("artifact reusable" badge in the submission form), check if the documentation is detailed and well-structured, to facilitate easy re-use, as described in the reusable badge checklist.
For artifacts requesting the "reproducible" badge ("results reproduced" in the submission form), additionally, the scripts to run the experiments and generate figures as mentioned in the results reproduced checklist.
Develop a plan on how you will evaluate the artifact during the review period, including the time frames of when experiments will be run in case hardware is shared.
Feel free to share the evaluation plan, and any initial feedback, with the authors via HotCRP comments.

At the end of this phase, reviewers will submit a brief review step highlighting the core points of this phase; Even though reviewers can already communicate with authors prior to that deadline to get points resolved, it serves as a milestone for authors to be in the loop when things do not work as expected at first.

Review Period

For each artifact you are assigned to, you will produce one review explaining which badges you believe should be awarded and why or why not. To facilitate engagement, a draft review showing the current state of the process will be made available to the authors in the middle of the review period.

You will work with the authors to produce your review, as this is a cooperative process. Authors are a resource you can use, exclusively through HotCRP, if you have trouble with an artifact or if you need more details about specific portions of an artifact.

First, read the description of IEEE Xplorer badges . Further, the available, reviewed, and reproducible badge sections provide a good way to track the requirements. If an artifact does not satisfy the requirements but the authors provide a good reason as to why they should get the badge anyway, use your judgment based on the definitions of the badges. Remember that the Artifact Reviewed and Reproducible badges require not only running the code but also auditing it to ensure that, for Artifact Reviewed, the artifact is documented, consistent, complete, and exercisable, and for Reproducible, the artifact does what the paper states it does and reproduces results to support all the main claims of the paper. You are not expected to understand every single line of code, but you should be confident that the artifact overall matches the paper’s description.

Most of your time should be spent auditing artifacts, not debugging them. It is the authors’ responsibility to make their artifacts work, not yours. You do not need to spend hours trying to debug and fix complex issues; if you encounter a non-trivial error, first ask your fellow evaluators if they encountered it too, or if they know how to fix it, then ask the authors to fix it. If you run into issues such as missing dependencies, try to quickly work around them, such as finding the right package containing the dependency for your operating system and letting the authors know they have to fix their documentation.

It is acceptable to deny badges if artifacts require unreasonable effort , especially if such effort could be avoided through automation. For instance, if reproducing a claim requires 50 points of data, and the artifact requires you to manually edit 5 config files then run 4 commands on 3 machines for each data point, you do not need to actually perform hundreds of manual steps; instead, ask the authors to automate this, or even write a script yourself if you have the time that you can then share with the authors.

Once you are finished evaluating an artifact, fill in the review form and submit it at your earliest convenience. Your review must explain in detail why the artifact should or should not get each of the badges that the authors requested. You can also include additional suggestions for the authors to improve their artifacts if you have any. Note that you can edit your review as many times as you like since reviews only become visible to the authors when final decisions are announced.

Remember that the artifact evaluation process is cooperative, not adversarial. Give authors a chance to fix issues by discussing through HotCRP comments before deciding that their artifact should not get a badge. In other words, help the authors improve their artifacts and reach badge status in the allocated time, whenever possible. However, if authors are being unresponsive or unreasonable, feel free to post a comment stating a badge cannot be awarded unless the authors take the specified steps in time by the deadline.

HotCRP allows you to rate your fellow evaluators’ reviews. If you think a review is well done, don’t hesitate to add a positive vote! If you think a review could use improvement, you can leave a negative vote and optionally a reviewer discussion comment explaining your thoughts.

Badge Checklists

Below you can find guidelines for the three major badges that can be awarded during the artifact evaluation process. We make sure that the HotCRP provides guidance on all categories in the review form as well.

Artifact Available Badge Checklist

Documented: The artifact has a "readme" file with high-level documentation for using:

A description, such as which folders correspond to code, benchmarks, data, …
A list of supported environments, including OS, specific hardware if necessary, …
Compilation and running instructions, including dependencies (and their versions) and pre-installation steps, with a reasonable degree of automation such as scripts to download and build exotic dependencies
Configuration instructions, such as selecting IP addresses or disks
Usage instructions, such as analyzing a new data set
Instructions for a "minimal working example"

The artifact has documentation explaining the high-level organization of modules, and code comments explaining non-obvious code, such that other researchers can fully understand it
Complete and Consistent: The artifact contains all components the paper describes using the same terminology as the paper, and no obsolete code/data
Exercisable: The artifact contains everything necessary to run it. For example, if the artifact includes a container/VM, it must also contain a script to create it from scratch

Artifact Reviewed Checklist

The artifact has a "readme" file that documents:

The exact environment the authors used, including OS version and any special hardware
The exact commands to run to reproduce each claim from the paper
The approximate resources used per claim, such as "7 minutes, 1 GB of disk space"

The scripts to reproduce claims are documented, allowing researchers to ensure they correspond to the claims; merely producing the right output is not enough
The artifact’s external dependencies are fetched from well-known sources such as official websites or GitHub repositories
Changes to such dependencies should be clearly separated, such as using a patch file or a repository fork with a clear commit history

Reproducible Badge Checklist

The artifact has all the qualities of the Artifacts Functional level, but, in addition,

it is carefully documented and, in its final form, is subjectively easy to reuse for all reviewers,
the code is generally well-structured and documented, and
the reviewers are able to reproduce claims when using the artifact in so far that the efficacy of the artifact is demonstrated.