|SQL exercises||February 19, 12:46 MSK|
1. Can a correct query be considered wrong, and vice versa?
2. Why does the correct result set for some exercises that require getting ships by their classes contain the ship Bismarck, which is not present in the Ships table?
3. Why are some of the exercises based on the poorly structured Ships database?
4. Does the result of the check depend on the sorting order? And why isn't the result sorted according to the ORDER BY clause?
5. What is my position in the ranking determined by?
5'. Why did I disappear from the learning stage ranking, or turned up at the very bottom of it?
5". How can I reset my score and begin anew?
6. How is the certification testing process conducted?
7. When I resolve a previously completed SELECT exercise, a button labeled "Replace" appears. What does it mean? Does it affect my ranking?
8. Can I see the solutions by other participants?
9. What happens if a solution previously accepted by the system turns out to be wrong and fails the check after extra data has been added to the second database?
10. What is the cause of the message "SYSTEM ERROR..."?
11. What does, say, the following departure (arrival) time in the Trip table mean: "1900-01-01 14:30:00.000"?
12. What is the cause for only half of the page or even just its header being displayed after clicking the "Run" button? At the same time, everything is working as usual when the checkbox "Without checking" is activated.
13. When my query is checked, what do the messages "Data mismatch", "Wrong number of records", etc. mean?
14. What do the words "cost", "efficiency", "execution plan", "optimization" mean and what do we need them for?
15. If you couldn't find answers to your questions related to solving exercises, check the list of exercises discussed in the weekly "Site news" newsletters.
16. If you have fundamental problems with constructing queries, look up the topics discussed in the site newsletters.
17. How will I benefit from getting this certificate? And what distinguishes it from others?
18. What do the "+" and "-" signs on the certificate confirmation webpage mean?
19. Why can't I use CTE for solving some of the exercises on the site?
20. What stage do exercises with negative numbers belong to? Do I need to solve them at all?
21. What does the message "This task has hints, but these are not accessible as yet..." (red hint icon) mean, and how can I get access to this hint?
22. Can I get the second (check) database for testing my queries offline?
23. What exercise categories are there on the site, and what is the difference between them?
24. Can I submit my own exercise for publication on the site?
1. Can a correct query be considered wrong, and vice versa? ↑
If the query is logically right, it will always pass the check. There is a slight possibility that a logically flawed query is considered correct. The accuracy check consists in running the query against two databases with identical structures yet different data. Thus, if the system tells you your query is wrong but the result it displays matches the output of the reference (test) query it means your query returns wrong results when executed against the "hidden" second (or check) database. Still, it's possible a flawed query returns the same results as the reference query for both databases. This possibility can be reduced either by careful test data preparation or by using more than two databases to check the query. The second option is less desirable, since it will slow down query processing, resulting in delayed responses to the users. It is to be noted that a correct query, once submitted, should always pass the check regardless of the data in the databases it is run against. If you have reasonable doubts your correct query has been deemed wrong, please send your code to the website administrators.
2. Why does the correct result set for some exercises that require getting ships by their classes contain the ship Bismarck, which is not present in the Ships table? ↑
According to the database schema, the Outcomes table may contain ships not present in the Ships table. However, if the name of such a ship matches a class in the Classes table, this ship is the lead ship for that class, and we have all information about it except for the year it has been launched. Thus, if we have to display, say, the year of launch for all lead ships, then, the corresponding field of the result set should be NULL for this class, meaning a lack of data. For exercises where this may lead to ambiguous interpretation it is specifically pointed out ships from the Outcomes table should be taken into account. In addition, the phrase "all ships in the database" also means ships in both the Ships and the Outcomes tables.
3. Why are some of the exercises based on the poorly structured Ships database? ↑
The Ships database is indeed an example of bad design. Here, the Outcomes table may contain orphaned records (i.e. ships whose class is not known), which makes it impossible to maintain referential integrity between the Outcomes and Ships tables. Using such a structure for learning purposes can be justified by the fact it often requires more complex queries, and thus allows exploiting the SELECT syntax more fully. To be fair, both tables themselves are normalized, i.e. they are in 3NF and even in BCNF.
4. Does the result of the check depend on the sorting order? And why isn't the result sorted according to the ORDER BY clause? ↑
Whether your query is deemed correct or wrong doesn't depend on the sorting order. The result will be sorted according to the ORDER BY clause of your query if the checkbox "Without checking" is activated. Otherwise, the default sorting by all columns from left to right is applied.
5. What is my position in the ranking determined by? ↑
Your ranking position is determined by the total of points assigned for exercises solved (except for stage three - see FAQ#6). If several participants have the same score, the one who needed less time to solve the exercises will be ranked higher. Thereby for stage one, this time corresponds to the period from starting solving the first completed exercise to finishing the last one. For the second stage, only the total net solving time (from opening the exercise until the query passes the check) is recorded. See FAQ#6 for more information about the second rating stage. Currently, there are 8 exercises belonging to the first rating stage on the site.
5'. Why did I disappear from the learning stage ranking, or turned up at the very bottom of it? ↑
There is only one ranking you can participate in. After solving any rating stage exercises, you leave the educational ranking (although all your solutions will be retained by the system) and move to the main one. If later, you complete learning stage exercises you haven't solved previously, respective points will be assigned, but the score will be reset back to zero during the weekly update of the rankings.
5". How can I reset my score and begin anew? ↑
You can't do that. This is because we want the ranking both to be objective and to reflect the learning efficiency adequately.
6. How is the certification testing process conducted? ↑
1. Only participants who solved all exercises at stage one (up to and including #9) get access to the certification (or second) stage. Their results achieved at stage one will be used only as a tiebreaker criterion in case of a tie.
2. At stage 2, exercises are solved one by one, in the same order for each participant. The number of the next exercise will appear in the list after the previous one is solved. The period of time from the moment the web page with the exercise has been sent to the user until the solution passes the check is recorded.
3. In case scores are tied, the second stage ranking position depends on the total of the recorded net solving times for all exercises (the less it is the higher the rank). Thus, the ranking now doesn't depend on the gross time: the same number of exercises can be done, say, in a single hour at one go, or within a year with interruptions. Besides, it makes it easier for the website administration to add new exercises as they become available.
4. For each participant, one worst net solution time isn't taken into account in order to compensate for time losses caused by unforeseen circumstances.
5. The website administration may replace some exercises with new ones; participants who have already solved the replaced exercise will have to solve the new one before they can advance further.
Since participants are awarded certificates based on their results achieved at stage 2, we kindly ask you to adhere to the following simple rules:
a) please don't publish the exercises and ask for help elsewhere;
b) in case of difficulties, you can email our help desk.
We reserve the right to impose repressive measures including a permanent ban on violators of rule (a).
6. The third stage begins with exercise 139. It's called optimization stage, since here, it's not enough to just get the correct answer. In addition, the participant's query execution time should not exceed the one of the test query more than twice. To get access to the third stage, one has to solve exercise # 138.
Stage three rules:
- Time spent on solving stage two exercises doesn't affect the ranking, i.e. the participant's position is based on the results of stage three only.
- Stage three participants are always ranked higher than any of participants at stage two, independent of their total score.
- Exercises may be solved in any order, and several ones may be solved simultaneously. Naturally, in the last case the solving times for each opened exercise will be added up.
- A stage three participant's ranking position won't decrease even if he stops solving new exercises at previous stages completely. Scores earned at previous stages are taken into account only in case of a tie.
7. When I resolve a previously completed SELECT exercise, a button labeled "Replace" appears. What does it mean? Does it affect my ranking? ↑
At stage two, each participant's first and last solutions that passed the check are stored in the database for each exercise. Solving the exercise repeatedly doesn't change the initially recorded solution time, and thus doesn't affect the main ranking position.
However, the last solution modification time plays an important role in the efficiency ranking (see FAQ#14); therefore, when an exercise is solved anew, the system asks whether the old solution has to be replaced with the new one. Hence, replacing the old solution makes sense only if it's no longer valid (doesn't pass the check any more) or if the new query turns out to be more efficient.
8. Can I see the solutions by other participants? ↑
You may discuss your own solution and get acquainted with queries by other participants on the forum. Please note the following:
1. You can join the discussion of an exercise only after you solved it (or at least after your query passed the check).
2. Only registered users have access to the forum.
3. A solution can be published exclusively by its author. Even if we deem a solution worth to be shared with our users we won't do that without the author's approval.
9. What happens if a solution previously accepted by the system turns out to be wrong and fails the check after extra data has been added to the second database? ↑
This won't affect your ranking position, nor are you obligated to rewrite your query for the exercise in question. Still, it's recommended to keep your solutions "up to date" to be able to participate in the efficiency contest whose implementation is underway.
10. What is the cause of the message "SYSTEM ERROR..."? ↑
This is an unexpected error the system can't process. Please inform the website administration if this error persists.
11. What does, say, the following departure (arrival) time in the Trip table mean: "1900-01-01 14:30:00.000"? ↑
The plane departure and arrival times specified in the timetable are stored in fields of DATETIME type. Therefore, when data in a time-only format is entered into these fields, the system automatically supplements it by a zero date. Since there is no guarantee this date will be the same everywhere it's safe not to make any assumptions about this date in your queries. E.g., the following construct could be used to find out the exact departure date and time for a trip:
12. What is the cause for only half of the page or even just its header being displayed after clicking the "Run" button?
At the same time, everything is working as usual when the checkbox "Without checking" is activated. ↑
This situation occurs when your (possibly correct) query exceeds the timeout set on the server. When the "Without checking" check box is activated, the query is executed against the main database only. Since it is significantly smaller than the second database, processing the query and page generation fit into the time limit.
What should you do about that? Rewrite your query to improve its performance. Note that the reference queries used for accuracy checking take no longer than 2-3 seconds to execute.
13. When my query is checked, what do the messages "Data mismatch", "Wrong number of records", etc. mean? ↑
Wrong number of columns (A,B) - your query retrieved records consisting of A fields, while the correct result set contains B columns.
Wrong number of records (more|less by X) - your query returned X records more or less than the reference query.
For SELECT exercises, the "less" case occurs, for example, if too strict search conditions are specified or not all necessary data sources are used. The "more" case may be encountered because of insufficiently strict search conditions or inclusion of the same data multiple times.
For DML exercises, the same line of thought can be followed, yet the type of the performed data manipulation operation should be kept in mind. (In the checking process, the system compares the data in the tables themselves after the DML statement has been applied).
Data mismatch (X) - your query returned a set of records having X fields whose values differ from the reference result set. For the main database, such differing values are highlighted red.
14. . What do the words "cost", "efficiency", "execution plan", "optimization" mean and what do we need them for? ↑
As is known, for each exercise a number of solutions can be constructed that will return the same result but differ qualitatively. A query is more efficient than another one if it is executed faster and requires fewer resources (CPU time, memory, disk operations) to be allocated by the DBMS. For each query written in SQL, there is a large number of execution plans. The purpose of the DBMS component called query optimizer is the choice of the best (that is, the most efficient) plan from this set. Used as the criterion of efficiency is the query cost - a numeric ratio calculated based on several parameters and relying on the available statistics. The optimizer chooses the plan with the smallest cost for query execution.
It should be noted that the query cost isn't equivalent to its execution time - it just helps to choose a single data retrieval plan of a variety for a given query. Thus, it's not entirely correct to compare the plan costs of different queries. Moreover, quite often a query with a greater cost works faster than its counterpart with a smaller plan cost. Let alone the fact that the optimizer can be "fooled" easily into underestimating the cost.
It is fair to say that the main purpose of the tool provided by the site is to study the plan; the figures are just motivators for learning how to read the plan.
The ability to understand query plans is crucially important for improving query performance, since it's what enables you to locate and avoid bottlenecks in your code.
The theoretical basis of query optimization can be found in
15. If you couldn't find answers to your questions related to solving exercises, check the list of exercises discussed in the weekly "Site news" newsletters. ↑
16. If you have fundamental problems with constructing queries, look up the topics discussed in the site newsletters. ↑
17. How will I benefit from getting this certificate? And what distinguishes it from others? ↑
1. Other things being equal, our certificate gives you advantage when applying for a job.
2. It allows you to publish your resume on the site, accessible for potential employers and HR agencies.
3. It lets you place a link to your personal or your company's website.
4. It helps promoting this website, which, in turn, increases your popularity in Internet.
5. A distinguishing feature of our certificate is that it confirms practical skills rather than the successful completion of a test. As we see it, knowledge of the language syntax is by far not the same as the ability to solve a problem using it.
18.What do the "+" and "-" signs on the certificate confirmation webpage mean? ↑
On the certificate confirmation page you can see one of the two symbols "+" or "-" next to the certificate holder's photo.
The plus sign (+) means the certificate is up-to-date, and the minus sign (-) that it is outdated. The certificate validity will be confirmed in any case. A certificate may become outdated if exercises have been replaced at the certification stages or if the qualification requirements have been increased.
Our view is that maintaining one's skills at a high level requires constant practice. To confirm her/his qualification, a specialist needs to solve the replaced or newly added exercises. This doesn't take too much time (new certification stage exercises are introduced not too often, and in small numbers), and is absolutely free.
We always inform about new exercises in newsletters and on the site forum.
19. Why I can't use CTE for solving some of the exercises on the site? ↑
CTE or common table expressions were introduced in Microsoft SQL Server 2005. Currently, the restriction of CTE use (for recursive CTE, in particular) is stipulated by the fact that for some exercises, set-based solutions turn out to be more efficient than solutions using the procedural approach characteristic for recursive CTE. Bearing in mind the main purpose of the site is learning, we sometimes "force" the users to use a set-based approach. Still, even for such exercises, CTE can be used with the "Without checking" option activated, or after an alternative solution has been found.
20. What stage do exercises with negative numbers belong to? Do I need to solve them at all? ↑
Exercises with negative numbers formally belong to the first rating stage, but they are optional (and even not recommended for beginners). These exercises don't affect either the certification or advancing to stage two and further, although ranking points are awarded for solving them.
This so-called negative section contains exercises that aren't very suitable for other stages for various reasons (e.g., because of a not quite unambiguous wording).
21. What does the message "This task has hints, but these are not accessible as yet..." (red hint icon) mean, and how can I get access to this hint? ↑
For the hint to become accessible (and the hint icon to turn green), your query should return the correct result when executed against the open database in the first place.
Besides, except for the first rating stage (at which the solution time isn't recorded for each individual exercise), a certain amount of time has to elapse after the user started solving the exercise. This time period is set separately for each exercise. Its purpose is to "avoid spoilers" and give the user an additional chance to solve the exercise independently.
Note that after the aforementioned period is over, the hint doesn't become accessible automatically; i.e., to read it, you need to run your query an extra time.
22. Can I get the second (check) database for testing my queries offline? ↑
The purpose of the check database is to prevent cheating; therefore, we're not going to put it in the public domain.
Besides, we believe that the ability to simulate data for testing your queries (regardless of whether you do it mentally or on a local computer) is a useful skill that also can be trained by solving exercises on this website.
23. What exercise categories are there on the site, and what is the difference between them? ↑
Learning stage. Learning stage ranking points are assigned that don't affect the main ranking list. The solution time (the period from the opening of the exercise till the moment the system accepts the user's solution as correct) is recorded for each exercise. Note that, since this stage allows opening several exercises simultaneously, the total solution time may exceed the actual time the user spent solving them.
First stage (exercises 1-8). Solution time is not recorded, main ranking points are assigned (see Score column). This stage will finally be eliminated. One has to solve all exercises of this stage to move on to the second stage. The exercises can be solved in any order.
Exercises with negative numbers. Those are optional exercises users don't need to solve to get certificates and to move on to more advanced ranking stages. However, points assigned for solving them are added to the user's score.
DML. Those are data manipulation exercises. Solution time is recorded but doesn't affect the user's rank (just the points do).
Second stage (exercises 8-138). Solution time is recorded; therefore, the user has to open the exercises one at a time, with the next exercise becoming available after the previous one has been solved.
Third stage (optimization training, exercises 139-250). Solution time is recorded, and the exercise is deemed to be solved if the solution passes all checks, and its CPU execution time doesn't exceed the execution time of the test query more than twice. Points are added up in the Score_3 column, and the solution time is calculated into the Days_3 column.
Puzzles (exercises 251-300). Time control is established similarly to stage 3, with the difference that only the assigned points affect the user's rank, not the solution time.
Exercises with 301+ numbers. This category is intended for discussion of new exercises and their subsequent transfer to other stages (except for the third one).
24. Can I submit my own exercise for publication on the site? ↑
Any registered user may submit an exercise. Please note, however, that, since similar exercises are to be avoided, it is advisable for the future author to solve most of the exercises already existing. Guide to Exercise Authors.