pencilmark techniques

What is the best way to do the pencilmarks? Do you usually go by row or column or box, or does it depend on the puzzle?
I usually go by the box, saying the 1 can go here and here. The 2 can go there there and there. A friend of mine looks at each cell and says this can be a 1,3,5 or 7
Joined: 01 Aug 2005
Posts: 193
Location: Bideford Devon EX39

> What is the best way to do the pencil marks?
> Do you usually go by row or column or box,
> or does it depend on the puzzle?

There are two questions here.
a) What format should they have.
b) How to compile them

I suspect that the questioners intends the latter but
both are worthy of comment.

By convention the "Candidate Profile" marks consist of
a "string" of digits in the top left corner of each cell - in
such a way that digits can be removed from the string
as logic determines until only ONE candidate remains
and this is then promoted to "big number" status as
the resolved value for that cell.

There are variations on this technique. Some use small
dots placed strategically in the cell. Candidate Profiles
are the basis for "computer" and "advanced" methods
of solution but they are not the ONLT "pencil marks".

The "Mandatory Pairs" method uses marks made in the
bottom left of the cells and it has been suggested that
the bottom right be reserved for a "sequence" number
which increments as each cell is resolved - although I
find that this is prone to human error during the actual
solving process when attention is on the logic rather
than on record keeping!

There is also a technique of recording "Missing" profiles
for each row and column. This is, basically, a string of
those digits which are YET to be resolved in the row or
column concerned. They appear OUTSIDE the grid. I
usually put them above and to the right.

A refinement to this (and one of the benefits of recording
them at all) is to separate the text strings into "substrings"
which record any pairs/triples etc identified. For example
a column may not yet have 24578 resolved but two of
the cells may form a 25 pair. I would record this as
(25)(478) - using parenthesis as separators. Extending
this convention I would use the parenthesis even where
no substrings have been identified - eg (24578).


Mandatory Pairs marks do NOT need to be compiled or
derived. They are a record of situations which come to
light during the application of logic to the puzzle.

Missing Profiles are relatively easy to identify. It is just
a question of recording what one does in the "Counting"
process applied to each row or column. This is very much
a mechanical process (but not as complex as deriving the
full candidate profiles as discussed below). The value in
them is threefold

a) Quick reference (avoid having to repeat the counting)
b) Comparison of row/column intersects without having
    to derive full candidate profiles.
c) Easier derivation of Candidate Profiles.

Under (b) a quick comparison of the "Missing" profiles for
the row and column intersecting a particular cell can be
used to produce a "substring" of digits that appear in BOTH
the profile for the row and that for the column. If this gives
a "common" substring of just ONE digit, there is a resolution!
If it gives more than one digit in the common substring, one
needs to check the region in which the cell is situated to see
if any of the digits in the substring can be eliminated. The
substring (pruned in the light of the region if appropriate)
is then the candidate list for that cell.

My own practice is to undertake (b) only if there is a fair
chance of only one or two digits in the common string. A
quick scan of the "Missing" profiles generally enables one
to concentrate on the more likely intersects during the
"earlier" part of the solution process (ie before derivation
of the candidate profiles - assuming that one does not
rush in and do that first!!).

However, applying the technique in (b) for ALL unresolved
cells leads one to the FULL set of candidate profiles - (c)!

I prefer to identify the candidate profile for each cell and
then to move on to another cell - rather than attempting
to build up the profiles for all cells together by testing
each digit in turn across the whole grid. I find the latter
to be messy - and to take up more space as one has to
return to a cell several times and alignment of the "tiny
writing" is not always precise (whereas derivation of a
complete string enables the string for any cell to be
written in a single process).

By making use of the 'Missing Profiles', I generally work
first on the rows/columns with the shortest profiles
(ie working on all rows with seven resolved cells, then
columns with seven, rows with six, columns with six
etc). This avoids working with a 'long' string when
using the shorter intersect string would be quicker.

Example: 234789 intersects with 258. Here one would
have six questions: (Is 2 in 258?, Is 3 in 258?, is 4 in
258? etc) using the first string but only three (Is 2 in
234789?, is 5 in 234789? etc) using the second.

However, many would no doubt prefer to use the more
logical route of working systematically row by row or
column by column. It is personal preference!

I would admit to disliking the chore of deriving the
candidates and so I seek to reduce this to a minimum
(if possible by solving the puzzle without resort to them!).
I note that some people get a computer program to
derive the candidates before they start working on the
grid and then they manage the eliminations thereafter.
It has been suggested that this be an OPTIONAL feature
of the "Draw" facility on this site (cf the "sweep" facility
on one or more of the other Sudoku sites) but SamGJ
clearly has other priorities.


Returning to the original question:

There is no right or wrong WAY to compile the candidate
profiles - but there can often be wrong RESULTS when
the process is undertaken by humans. There can be no
reliable statistics on this but I would hazard a guess that
at least ten percent candidate profile derivations contain
an error during the first "go" at them by humans. I know
that I need to cross them meticulously.

Regarding checking. The essential principle is that implied
in "Congruity". Basically, this states that the number of
distinct digits referenced  in the candidate profiles for a
row, column or region MUST be the same as the number
of unresolved cells in the same row/column/region.

If a congruity check fails one MUST investigate. Usually, I
find that I failed to exclude a digit present in the region
but not in either of the row and column intersections. This
is easily correctable - but MUST be corrected as else a
logical pattern that truly exists may not be observable.

The danger is in setting a candidate profile that excludes
a digit that really is a possibility for the cell. If this is not
corrected, one is likely to be led into false conclusions and
to resolve cells without proper justification. In my view it
is more likely that one will omit a value if one is deriving
the cell profiles on a digit by digit basis rather than on a
cell-by-cell basis - but I am open to convincement by any
others with greater experience of compiling the profiles
than I have (as stated I attempt to avoid the exercise!).


So, a big thank you for raising this topic. There is no
right or wrong way - each has its own pitfalls - but in
this forum we can, perhaps, learn of tips and tricks
that can lead to efficient and accurate compilations.

Alan Rayner  BS23 2QT

