Automated Scoring Principles – Codility

Generally, our tasks are scored on the basis of test cases.

We provide candidates with a task description and ask them to find a solution to the stated problem.
They have an example test case (which is just an example and doesn't count towards their overall score), and upon submission, we use test cases of similar nature to grade their solution.
Based on how many test cases are passed by their solution, they get the score.

Every task contains at least 6 test cases, often more. For example, if a candidate’s solution fails 4 out of 10 assessed test cases, the score for this task is 60% because 6 out of the 10 test cases were passed successfully.

When reviewing a candidate report, you will see the analysis summary which will go into details about which test cases the solution passed, and which returned the wrong value or had a time out error.

Correctness test cases measure how well the solution satisfies the requirements of the task. All exercises are evaluated by a set of correctness test cases.

Some tasks require the candidate to write an algorithm. For exercises like this, Performance is measured in addition to correctness.

Performance test cases measure how efficient the solution is. It's based on a computer science concept called "Big O Notation." Simply put, performance test cases assess how long it takes the solution to run when increasingly large amounts of data are put into it. These test cases apply only to fundamental tasks.

Tasks with an exception to the rule:

QA TASKS

The nature of QA tasks is different from the rest of the tasks in our library, so we apply a different principle. Instead of presenting the candidate with a problem to solve, we provide candidates with an example of a page that works as it should, acceptance criteria for this page, and ask them to write test cases themselves. Their test cases, which are assessed altogether as a suite, are required to include all acceptance criteria.

If we were to score such tasks based on our usual principle, candidates could gain a certain score just by providing a test case that would get the FAIL result on all the datasets containing incorrect information. The first test case that we score is the most important one. This test case includes a perfect page in a perfect environment with all the correct data. The candidate’s test suite should pass this test case. If it does, we score it against the rest of the test cases which include pages where some of the acceptance criteria are missing. If it doesn’t, it automatically receives a score of 0%.

In summary, every test case includes an actual page on which the test suite is being executed. However, if the candidate’s solution doesn’t perform well on the first test case (which includes a perfect page), it receives the score of 0 regardless of whether it passes the rest of them or not.

If you have any additional questions about Automated Scoring Principles or would like to leave your feedback, please reach out to your Customer Success Manager or contact us at support@codility.com.