<chapter id="troubleshoottasks-33506"><title>Troubleshooting Solaris Volume Manager (Tasks)</title><highlights><para>This chapter describes how to troubleshoot problems that are related
to Solaris Volume Manager. This chapter provides both general troubleshooting guidelines
and specific procedures for resolving some known problems.</para><para>This chapter includes the following information:</para><itemizedlist><listitem><para><olink targetptr="troubleshoottasks-83" remap="internal">Troubleshooting Solaris
Volume Manager (Task Map)</olink></para>
</listitem><listitem><para><olink targetptr="troubleshoottasks-1" remap="internal">Overview of Troubleshooting
the System</olink></para>
</listitem><listitem><para><olink targetptr="troubleshoottasks-95" remap="internal">Replacing Disks</olink></para>
</listitem><listitem><para><olink targetptr="tasks-basics-7" remap="internal">Recovering From Disk Movement
Problems</olink></para>
</listitem><listitem><para><olink targetptr="extkh" remap="internal">Device ID Discrepancies After Upgrading
to the Solaris 10 Release</olink></para>
</listitem><listitem><para><olink targetptr="troubleshoottasks-29" remap="internal">Recovering From Boot
Problems</olink></para>
</listitem><listitem><para><olink targetptr="tasks-state-db-replicas-11" remap="internal">Recovering From
State Database Replica Failures</olink></para>
</listitem><listitem><para><olink targetptr="tasks-softpart-26" remap="internal">Recovering From Soft
Partition Problems</olink></para>
</listitem><listitem><para><olink targetptr="troubleshoottasks-84" remap="internal">Recovering Storage
From a Different System</olink></para>
</listitem><listitem><para><olink targetptr="egcpz" remap="internal">Recovering From Disk Set Problems</olink></para>
</listitem><listitem><para><olink targetptr="exlvv" remap="internal">Performing Mounted Filesystem Backups
Using the ufsdump Command</olink></para>
</listitem><listitem><para><olink targetptr="fsujr" remap="internal">Performing System Recovery</olink></para>
</listitem>
</itemizedlist><para>This chapter describes some Solaris Volume Manager problems and their appropriate
solution. This chapter is not intended to be all-inclusive. but rather to
present common scenarios and recovery procedures.</para>
</highlights><sect1 id="troubleshoottasks-83"><title>Troubleshooting Solaris Volume Manager (Task
Map)</title><para>The following task map identifies some procedures that are needed to
troubleshoot Solaris Volume Manager. </para><informaltable frame="all"><tgroup cols="3" colsep="1" rowsep="1"><colspec colname="colspec0" colwidth="110.00*"/><colspec colname="colspec1" colwidth="167.00*"/><colspec colname="colspec2" colwidth="119.00*"/><thead><row><entry><para>Task</para>
</entry><entry><para>Description</para>
</entry><entry><para>For Instructions</para>
</entry>
</row>
</thead><tbody><row><entry><para>Replace a failed disk</para>
</entry><entry><para>Replace a disk, then update state database replicas and logical volumes
on the new disk.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-96" remap="internal">How to Replace a Failed Disk</olink></para>
</entry>
</row><row><entry><para>Recover from disk movement problems</para>
</entry><entry><para>Restore disks to original locations or contact product support.</para>
</entry><entry><para><olink targetptr="tasks-basics-7" remap="internal">Recovering From Disk Movement Problems</olink></para>
</entry>
</row><row><entry><para>Recover from improper <literal>/etc/vfstab</literal> entries</para>
</entry><entry><para>Use the <command>fsck</command> command on the mirror, then edit the <filename>/etc/vfstab</filename> file so that the system boots correctly.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-35369" remap="internal">How to Recover From Improper
/etc/vfstab Entries</olink></para>
</entry>
</row><row><entry><para>Recover from a boot device failure</para>
</entry><entry><para>Boot from a different submirror. </para>
</entry><entry><para><olink targetptr="troubleshoottasks-21051" remap="internal">How to Recover From a Boot
Device Failure</olink></para>
</entry>
</row><row><entry><para>Recover from insufficient state database replicas</para>
</entry><entry><para>Delete unavailable replicas by using the <command>metadb</command> command.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-31036" remap="internal">How to Recover From Insufficient
State Database Replicas</olink></para>
</entry>
</row><row><entry><para>Recover configuration data for a lost soft partition</para>
</entry><entry><para>Use the <command>metarecover</command> command to recover configuration
data for a soft partition.</para>
</entry><entry><para><olink targetptr="tasks-softpart-27" remap="internal">How to Recover Configuration Data
for a Soft Partition</olink></para>
</entry>
</row><row><entry><para>Recover a Solaris Volume Manager configuration from salvaged disks</para>
</entry><entry><para>Attach disks to a new system and have Solaris Volume Manager rebuild the configuration
from the existing state database replicas.</para>
</entry><entry><para><olink targetptr="eqqcz" remap="internal">How to Recover Storage From a Local Disk Set</olink></para>
</entry>
</row><row><entry><para>Recover storage from a different system</para>
</entry><entry><para>Import storage from known disk sets to a different system. </para>
</entry><entry><para><olink targetptr="troubleshoottasks-84" remap="internal">Recovering Storage From a Different
System</olink></para>
</entry>
</row><row><entry><para>Purge an inaccessible disk set. </para>
</entry><entry><para>Use the <command>metaset</command> command to purge knowledge of a disk
set that you cannot take or use. </para>
</entry><entry><para><olink targetptr="egcpz" remap="internal">Recovering From Disk Set Problems</olink></para>
</entry>
</row><row><entry><para>Recover a system configuration stored on Solaris Volume Manager volumes.</para>
</entry><entry><para>Use Solaris OS installation media to recover a system configuration stored
on Solaris Volume Manager volumes.</para>
</entry><entry><para><olink targetptr="fsujr" remap="internal">Performing System Recovery</olink></para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</sect1><sect1 id="troubleshoottasks-1"><title>Overview of Troubleshooting the System</title><sect2 id="troubleshoottasks-28376"><title>Prerequisites for Troubleshooting
the System</title><para>To troubleshoot storage management problems that are related to Solaris Volume Manager,
you need to do the following:</para><itemizedlist><listitem><para>Have root privilege</para>
</listitem><listitem><para>Have a current backup of all data</para>
</listitem>
</itemizedlist>
</sect2><sect2 id="troubleshoottasks-2"><title>General Guidelines for Troubleshooting Solaris Volume Manager</title><para>You should have the following information on hand when you troubleshoot Solaris Volume Manager problems:</para><itemizedlist><listitem><para>Output from the <command>metadb</command> command</para>
</listitem><listitem><para>Output from the <command>metastat</command> command</para>
</listitem><listitem><para>Output from the <command>metastat -p</command> command</para>
</listitem><listitem><para>Backup copy of the <filename>/etc/vfstab</filename> file</para>
</listitem><listitem><para>Backup copy of the <filename>/etc/lvm/mddb.cf</filename> file</para>
</listitem><listitem><para>Disk partition information from the <command>prtvtoc</command> command
(<trademark class="registered">SPARC</trademark> systems) or the <command>fdisk</command> command (x86 based systems)</para>
</listitem><listitem><para>The Solaris version on your system</para>
</listitem><listitem><para>A list of the Solaris patches that have been installed</para>
</listitem><listitem><para>A list of the Solaris Volume Manager patches that have been installed</para>
</listitem>
</itemizedlist><tip><para>Any time you update your Solaris Volume Manager configuration, or make other
storage or operating system-related changes to your system, generate fresh
copies of this configuration information. You could also generate this information
automatically with a <command>cron</command> job. </para>
</tip>
</sect2><sect2 id="troubleshoottasks-116"><title>General Troubleshooting Approach</title><para>Although no single procedure enables you to evaluate all problems with Solaris Volume Manager,
the following process provides one general approach that might help.</para><orderedlist><listitem><para>Gather information about current the configuration.</para>
</listitem><listitem><para>Review the current status indicators, including the output
from the <command>metastat</command> and <command>metadb</command> commands.
This information should indicate which component is faulty.</para>
</listitem><listitem><para>Check the hardware for obvious points of failure:</para><itemizedlist><listitem><para>Is everything connected properly?</para>
</listitem><listitem><para>Was there a recent electrical outage?</para>
</listitem><listitem><para>Have there been equipment changes or additions?</para>
</listitem>
</itemizedlist>
</listitem>
</orderedlist>
</sect2>
</sect1><sect1 id="troubleshoottasks-95"><title>Replacing Disks</title><para>This section describes how to replace disks in a Solaris Volume Manager environment. </para><caution><para>If you have soft partitions on a failed disk or on volumes
that are built on a failed disk, you must put the new disk in the same physical
location Also, use the same <literal>c</literal><replaceable>n</replaceable><literal>t</literal><replaceable>n</replaceable><literal>d</literal><replaceable>n</replaceable><literal></literal> number
as the disk being replaced.</para>
</caution><task id="troubleshoottasks-96"><title>How to Replace a Failed Disk</title><procedure><step id="troubleshoottasks-step-98"><para>Identify the failed disk to be replaced by examining the <filename>/var/adm/messages</filename> file and the <command>metastat</command> command output. </para>
</step><step id="troubleshoottasks-step-99"><para>Locate any state database replicas
that might have been placed on the failed disk. </para><para>Use the <command>metadb</command> command to find the replicas.</para><para>The <command>metadb</command> command
might report errors for the state database replicas that are located on the
failed disk. In this example, <filename>c0t1d0</filename> is the problem device.</para><screen># <userinput>metadb</userinput>
   flags       first blk        block count
  a m     u        16               1034            /dev/dsk/c0t0d0s4
  a       u        1050             1034            /dev/dsk/c0t0d0s4
  a       u        2084             1034            /dev/dsk/c0t0d0s4
  W   pc luo       16               1034            /dev/dsk/c0t1d0s4
  W   pc luo       1050             1034            /dev/dsk/c0t1d0s4
  W   pc luo       2084             1034            /dev/dsk/c0t1d0s4</screen><para>The output shows three state database replicas on each slice 4 of the
local disks, <filename>c0t0d0</filename> and <filename>c0t1d0</filename>.
The <literal>W</literal> in the flags field of the <filename>c0t1d0s4</filename> slice
indicates that the device has write errors. Three replicas on the <filename>c0t0d0s4</filename> slice are still good.</para>
</step><step id="troubleshoottasks-step-100"><para>Record the slice name where the
state database replicas reside and the number of state database replicas.
Then, delete the state database replicas.</para><para>The number of state
database replicas is obtained by counting the number of appearances of a slice
in the <command>metadb</command> command output. In this example, the three
state database replicas that exist on <filename>c0t1d0s4</filename> are deleted. </para><screen># <userinput>metadb -d c0t1d0s4</userinput></screen><caution><para>If, after deleting the bad state database replicas, you are
left with three or fewer, <olink type="custom-text" targetptr="tasks-state-db-replicas-9" remap="internal">add more state database replicas</olink> before
continuing. Doing so helps to ensure that configuration information remains
intact.</para>
</caution>
</step><step id="troubleshoottasks-step-102"><para>Locate and delete any hot spares
on the failed disk.</para><para>Use the <command>metastat</command> command
to find hot spares. In this example, hot spare pool <literal>hsp000</literal> included <filename>c0t1d0s6</filename>, which is then deleted from the pool.</para><screen># <userinput>metahs -d hsp000 c0t1d0s6</userinput>
hsp000: Hotspare is deleted</screen>
</step><step><para>Replace the failed disk.</para><para>This step might entail using
the <command>cfgadm</command> command, the <command>luxadm</command> command,
or other commands as appropriate for your hardware and environment. When performing
this step, make sure to follow your hardware's documented procedures to properly
manipulate the Solaris state of this disk.</para>
</step><step id="troubleshoottasks-step-105"><para>Repartition the new disk.</para><para>Use the <command>format</command> command or the <command>fmthard</command> command
to partition the disk with the same slice information as the failed disk.
If you have the <command>prtvtoc</command> output from the failed disk, you
can format the replacement disk with the <command>fmthard -s <replaceable>/tmp/failed-disk-prtvtoc-output</replaceable></command> command.</para>
</step><step id="troubleshoottasks-step-106"><para>If you deleted state database
replicas, add the same number back to the appropriate slice.</para><para>In
this example, <filename>/dev/dsk/c0t1d0s4</filename> is used.</para><screen># <userinput>metadb -a -c 3 c0t1d0s4</userinput></screen>
</step><step id="troubleshoottasks-step-106a"><para>If any slices on the disk are
components of RAID-5 volumes or are components of RAID-0 volumes that are
in turn submirrors of RAID-1 volumes, run the <command>metareplace -e</command> command
for each slice. </para><para>In this example, <filename>/dev/dsk/c0t1d0s4</filename> and
mirror <filename>d10</filename> are used.</para><screen># <userinput>metareplace -e d10 c0t1d0s4</userinput></screen>
</step><step id="troubleshoottasks-step-113"><para>If any soft partitions are built
directly on slices on the replaced disk, run the <command>metarecover -m -p</command> command
on each slice that contains soft partitions. This command regenerates the
extent headers on disk.</para><para>In this example, <filename>/dev/dsk/c0t1d0s4</filename> needs
to have the soft partition markings on disk regenerated. The slice is scanned
and the markings are reapplied, based on the information in the state database
replicas. </para><screen># <userinput>metarecover c0t1d0s4 -m -p</userinput></screen>
</step><step><para>If any soft partitions on the disk are components of RAID-5 volumes
or are components of RAID-0 volumes that are submirrors of RAID-1 volumes,
run the <command>metareplace -e</command> command for each slice. </para><para>In
this example, <filename>/dev/dsk/c0t1d0s4</filename> and mirror <filename>d10</filename> are
used.</para><screen># <userinput>metareplace -e d10 c0t1d0s4</userinput></screen>
</step><step id="troubleshoottasks-step-114b"><para>If any RAID-0 volumes have soft
partitions built on them, run the <command>metarecover</command> command for
each RAID-0 volume.</para><para>In this example, RAID-0 volume, <filename>d17</filename>,
has soft partitions built on it. </para><screen># <userinput>metarecover d17 -m -p</userinput></screen>
</step><step id="troubleshoottasks-step-109"><para>Replace hot spares that were deleted,
and add them to the appropriate hot spare pool or pools.</para><para>In this
example, hot spare pool, <literal>hsp000</literal> included <literal>c0t1d0s6</literal>.
This slice is added to the hot spare pool.</para><screen># <userinput>metahs -a hsp000 c0t1d0s6</userinput>hsp000: Hotspare is added</screen>
</step><step id="troubleshoottasks-step-110"><para>If soft partitions or nonredundant
volumes were affected by the failure, restore data from backups. If only redundant
volumes were affected, then validate your data.</para><para>Check the user and application data on all volumes. You might
have to run an application-level consistency checker, or use some other method
to check the data. </para>
</step>
</procedure>
</task>
</sect1><sect1 id="tasks-basics-7"><title>Recovering From Disk Movement Problems</title><para>This section describes how to recover from unexpected problems
after moving disks in the Solaris Volume Manager environment. </para><sect2 id="troubleshoottasks-28943"><title>Disk Movement and Device ID Overview</title><para>Solaris Volume Manager uses device IDs, which are associated with a specific
disk, to track all disks that are used in a Solaris Volume Manager configuration. When
disks are moved to a different controller or when the SCSI target numbers
change, Solaris Volume Manager usually correctly identifies the movement and updates
all related Solaris Volume Manager records accordingly. No system administrator intervention
is required. In isolated cases, Solaris Volume Manager cannot completely update the
records and reports an error on boot. </para>
</sect2><sect2 id="troubleshoottasks-112"><title>Resolving Unnamed Devices Error Message</title><para>If you add new hardware or move hardware (for example, you move a string
of disks from one controller to another controller), Solaris Volume Manager checks
the device IDs that are associated with the disks that moved, and updates
the <literal>c</literal><replaceable>n</replaceable><literal>t</literal><replaceable>n</replaceable><literal>d</literal><replaceable>n</replaceable><literal></literal> names
in internal Solaris Volume Manager records accordingly. If the records cannot be updated,
the boot processes that are spawned by the <filename>svc:/system/mdmonitor</filename> service
report an error to the console at boot time:</para><screen>Unable to resolve unnamed devices for volume management.
Please refer to the Solaris Volume Manager documentation,
Troubleshooting section, at http://docs.sun.com or from
your local copy.</screen><para>No data loss has occurred, and none will occur as a direct result of
this problem. This error message indicates that the Solaris Volume Manager name records
have been only partially updated. Output from the <command>metastat</command> command
shows some of the <literal>c</literal><replaceable>n</replaceable><literal>t</literal><replaceable>n</replaceable><literal>d</literal><replaceable>n</replaceable><literal></literal> names
that were previously used. The output also shows some of the <literal>c</literal><replaceable>n</replaceable><literal>t</literal><replaceable>n</replaceable><literal>d</literal><replaceable>n</replaceable><literal></literal> names that reflect the state after the
move. </para><para>If you need to update your Solaris Volume Manager configuration while this condition
exists, you must use the <literal>c</literal><replaceable>n</replaceable><literal>t</literal><replaceable>n</replaceable><literal>d</literal><replaceable>n</replaceable><literal></literal> names
that are reported by the <command>metastat</command> command when you issue
any <literal>meta*</literal> commands. </para><para>If this error condition occurs, you can do one of the following to resolve
the condition:</para><itemizedlist><listitem><para>Restore all disks to their original locations. Next, do a
reconfiguration reboot, or run (as a single command):</para><screen>/usr/sbin/devfsadm &amp;&amp; /usr/sbin/metadevadm -r</screen><para>After these commands complete, the error condition is resolved.</para>
</listitem><listitem><para>Contact your support representative for guidance.</para><note><para>This error condition is quite unlikely to occur. If it does occur,
it is most likely to affect Fibre Channel-attached storage. </para>
</note>
</listitem>
</itemizedlist>
</sect2>
</sect1><sect1 id="extkh"><title>Device ID Discrepancies After Upgrading to the Solaris
10 Release</title><para>Beginning with the Solaris 10 release, device ID output is displayed
in a new format. Solaris Volume Manager may display the device ID output in a new or
old format depending on when the device id information was added to the state
database replica.</para><para>Previously, the device ID was displayed as a hexadecimal value. The
new format displays the device ID as an ASCII string. In many cases, the change
is negligible, as in the following example:</para><variablelist><varlistentry><term>old format:</term><listitem><para><literal>id1,ssd@</literal><firstterm><emphasis role="strong">w</emphasis></firstterm><literal>600c0ff00000000007ecd255a9336d00</literal></para>
</listitem>
</varlistentry><varlistentry><term>new format:</term><listitem><para><literal>id1,ssd@</literal><firstterm><emphasis role="strong">n</emphasis></firstterm><literal>600c0ff00000000007ecd255a9336d00</literal></para>
</listitem>
</varlistentry>
</variablelist><para>In other cases, the change is more noticeable, as in the following example:</para><variablelist><varlistentry><term>old format:</term><listitem><para><literal>id1,sd@w4849544143484920444b3332454a2d33364<?SolBook linebreak?>e4320202020203433334239383939</literal></para>
</listitem>
</varlistentry><varlistentry><term>new format:</term><listitem><para><literal>id1,ssd@n600c0ff00000000007ecd255a9336d00</literal></para>
</listitem>
</varlistentry>
</variablelist><para>When you upgrade to the Solaris 10 release, the format of the device
IDs that are associated with existing disk sets that were created in a previous
Solaris release are not updated in the Solaris Volume Manager configuration. If you
need to revert back to a previous Solaris release, configuration changes made
to disk sets after the upgrade might not available to that release. These
configuration changes include:</para><itemizedlist><listitem><para>Adding a new disk to a disk set that existed before the upgrade</para>
</listitem><listitem><para>Creating a new disk set</para>
</listitem><listitem><para>Creating state database replicas</para>
</listitem>
</itemizedlist><para>These configuration changes can affect all disk sets that you are able
to create in Solaris Volume Manager, including the local set. For example, if you implement
any of these changes to a disk set created in the Solaris 10 release, you
cannot import the disk set to a previous Solaris release. As another example,
you might upgrade one side of a mirrored root to the Solaris 10 release and
then make configuration changes to the local set. These changes would not
be recognized if you then incorporated the submirror back into the previous
Solaris release.</para><para>The Solaris 10 OS configuration always displays the new format of the
device ID, even in the case of an upgrade. You can display this information
using the <command>prtconf</command> <option>v</option> command. Conversely, Solaris Volume Manager displays
either the old or the new format. Which format is displayed in Solaris Volume Manager depends
on which version of the Solaris OS you were running when you began using the
disk. To determine if Solaris Volume Manager is displaying a different, but equivalent,
form of the device ID from that of the Solaris OS configuration, compare the
output from the <command>metastat</command> command with the output from the <command>prtconf</command> <option>v</option> command.</para><para>In the following example, the <command>metastat</command> command output
displays a different, but equivalent, form of the device ID for <literal>c1t6d0</literal> from
the <command>prtconf</command> <option>v</option> command output for the same
disk.</para><screen># <userinput>metastat</userinput>
d127: Concat/Stripe
    Size: 17629184 blocks (8.4 GB)
    Stripe 0:
        Device     Start Block  Dbase   Reloc
        c1t6d0s2      32768     Yes     Yes

Device Relocation Information:
Device   Reloc  Device ID <emphasis role="strong">c1t6d0   Yes    id1,sd@w4849544143484920444b3332454a2d3336<?SolBook linebreak?>4e4320202020203433334239383939</emphasis></screen><screen># <userinput>prtconf -v</userinput>
.(output truncated)

.
.
sd, <emphasis role="strong">instance #6</emphasis>
         System properties:
              name='lun' type=int items=1
                 value=00000000
              name='target' type=int items=1
                 value=00000006
              name='class' type=string items=1
                 value='scsi'
         Driver properties:
              name='pm-components' type=string items=3 dev=none
                 value='NAME=spindle-motor' + '0=off' + '1=on'
              name='pm-hardware-state' type=string items=1 dev=none
                 value='needs-suspend-resume'
              name='ddi-failfast-supported' type=boolean dev=none
              name='ddi-kernel-ioctl' type=boolean dev=none
              Hardware properties:
              name='devid' type=string items=1
                 value='<emphasis role="strong">id1,@THITACHI_DK32EJ-36NC_____433B9899</emphasis>'
.
.
.
(output truncated)</screen><para>The line containing &ldquo;instance #6&rdquo; in the output from the <command>prtconf</command> <option>v</option> command correlates to the disk <literal>c1t6d0</literal> in the output from the <command>metastat</command> command. The
device id, <literal>id1,@THITACHI_DK32EJ-36NC_____433B9899</literal>, in the
output from the <command>prtconf</command> <option>v</option> command correlates
to the device id, id1,sd@w4849544143484920444b3332454a2d33364e4320202020203433334239383939,
in the output from the <command>metastat</command> command. This difference
in output indicates that Solaris Volume Manager is displaying the hexadecimal form
of the device ID in the output from the <command>metastat</command> command,
while the Solaris 10 OS configuration is displaying an ASCII string in the
output from the <command>prtconf</command> command.</para>
</sect1><sect1 id="troubleshoottasks-29"><title>Recovering From Boot Problems</title><para>Because Solaris Volume Manager enables you to mirror the root (<filename>/</filename>), <filename>swap</filename>, and <filename>/usr</filename> directories, special problems
can arise when you boot the system. These problems can arise either through
hardware failures or operator error. The procedures in this section provide
solutions to such potential problems.</para><para>The following table describes these problems and points you to the appropriate
solution.</para><table frame="topbot" id="troubleshoottasks-12449"><title>Common Boot Problems
With Solaris Volume Manager</title><tgroup cols="2" colsep="0" rowsep="0"><colspec colname="column1" colwidth="165*"/><colspec colname="column2" colwidth="231*"/><thead><row rowsep="1"><entry><para>Reason for the Boot Problem</para>
</entry><entry><para>For Instructions</para>
</entry>
</row>
</thead><tbody><row><entry><para>The <literal>/etc/vfstab</literal> file contains incorrect information.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-35369" remap="internal">How to Recover From Improper
/etc/vfstab Entries</olink></para>
</entry>
</row><row><entry><para>Not enough state database replicas have been defined.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-31036" remap="internal">How to Recover From Insufficient
State Database Replicas</olink></para>
</entry>
</row><row><entry><para>A boot device (disk) has failed.</para>
</entry><entry><para><olink targetptr="troubleshoottasks-21051" remap="internal">How to Recover From a Boot
Device Failure</olink></para>
</entry>
</row>
</tbody>
</tgroup>
</table><sect2 id="troubleshoottasks-30"><title>Background Information for Boot Problems</title><itemizedlist><listitem><para>If Solaris Volume Manager takes a volume offline due to errors, unmount
all file systems on the disk where the failure occurred.</para><para>Because
each disk slice is independent, multiple file systems can be mounted on a
single disk. If the software has encountered a failure, other slices on the
same disk will likely experience failures soon. File systems that are mounted
directly on disk slices do not have the protection of Solaris Volume Manager error
handling. Leaving such file systems mounted can leave you vulnerable to crashing
the system and losing data.</para>
</listitem><listitem><para>Minimize the amount of time you run with submirrors that are
disabled or offline. During resynchronization and online backup intervals,
the full protection of mirroring is gone.</para>
</listitem>
</itemizedlist>
</sect2><sect2 id="troubleshoottasks-35369"><title>How to Recover From Improper <filename>/etc/vfstab</filename> Entries</title><para>If you have made an incorrect entry in the <filename>/etc/vfstab</filename> file,
for example, when mirroring the root (<filename>/</filename>) file system,
the system appears at first to be booting properly. Then, the system fails.
To remedy this situation, you need to edit the <filename>/etc/vfstab</filename> file
while in single-user mode. </para><para>The high-level steps to recover from improper <filename>/etc/vfstab</filename> file
entries are as follows:</para><orderedlist><listitem><para>Booting the system to single-user mode</para>
</listitem><listitem><para>Running the <command>fsck</command> command on the mirror
volume</para>
</listitem><listitem><para>Remounting file system read-write options enabled</para>
</listitem><listitem><para>Optional: running the <command>metaroot</command> command
for a root (<filename>/</filename>) mirror</para>
</listitem><listitem><para>Verifying that the <filename>/etc/vfstab</filename> file correctly
references the volume for the file system entry</para>
</listitem><listitem><para>Rebooting the system</para>
</listitem>
</orderedlist>
</sect2><task id="troubleshoottasks-82"><title>Recovering the root (<filename>/</filename>)
RAID-1 (Mirror) Volume</title><tasksummary><para>In the following example, the root (<filename>/</filename>) file system
is mirrored with a two-way mirror, <filename>d0</filename>. The root (<filename>/</filename>) entry in the <filename>/etc/vfstab</filename> file has somehow
reverted back to the original slice of the file system. However, the information
in the <filename>/etc/system</filename> file still shows booting to be from
the mirror <filename>d0</filename>. The most likely reason is that the <command>metaroot</command> command was not used to maintain the <filename>/etc/system</filename> and <filename>/etc/vfstab</filename> files. Another possible reason is that an old copy
of the<filename>/etc/vfstab</filename> file was copied back into the current <filename>/etc/vfstab</filename> file.</para><para>The incorrect <filename>/etc/vfstab</filename> file looks similar to
the following:</para><screen width="100">#device        device          mount          FS      fsck   mount    mount
#to mount      to fsck         point          type    pass   at boot  options
#
/dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0  /       ufs      1     no       -
/dev/dsk/c0t3d0s1 -                   -       swap     -     no       -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6  /usr    ufs      2     no       -
#
/proc             -                  /proc    proc     -     no       -
swap              -                  /tmp     tmpfs    -     yes      -</screen><para>Because of the errors, you automatically go into single-user mode when
the system is booted: </para><screen>ok <userinput>boot</userinput>
...
configuring network interfaces: hme0.
Hostname: host1
mount: /dev/dsk/c0t3d0s0 is not this fstype.
setmnt: Cannot open /etc/mnttab for writing

INIT: Cannot create /var/adm/utmp or /var/adm/utmpx

INIT: failed write of utmpx entry:"  "

INIT: failed write of utmpx entry:"  "

INIT: SINGLE USER MODE

Type Ctrl-d to proceed with normal startup,
(or give root password for system maintenance): &lt;<replaceable>root-password</replaceable>></screen><para>At this point, the root (<filename>/</filename>) and <filename>/usr</filename> file
systems are mounted read-only. Follow these steps:</para>
</tasksummary><procedure><step id="troubleshoottasks-step-33"><para>Run the <command>fsck</command> command
on the root (<filename>/</filename>) mirror.</para><note><para>Be careful to use the correct volume for the root (<filename>/</filename>)
mirror.</para>
</note><screen># <userinput>fsck /dev/md/rdsk/d0</userinput>
** /dev/md/rdsk/d0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2274 files, 11815 used, 10302 free (158 frags, 1268 blocks,
0.7% fragmentation)</screen>
</step><step id="troubleshoottasks-step-34"><para>Remount the root (<filename>/</filename>)
file system as read/write file system so that you can edit the <filename>/etc/vfstab</filename> file.</para><screen># <userinput>mount -o rw,remount /dev/md/dsk/d0 /</userinput>
mount: warning: cannot lock temp file &lt;/etc/.mnt.lock></screen>
</step><step id="troubleshoottasks-step-35"><para>Run the <command>metaroot</command> command.</para><screen># <userinput>metaroot d0</userinput></screen><para>This command edits the <filename>/etc/system</filename> and <filename>/etc/vfstab</filename> files to specify that the root (<filename>/</filename>) file system
is now on volume <filename>d0</filename>.</para>
</step><step id="troubleshoottasks-step-36"><para>Verify that the <filename>/etc/vfstab</filename> file
contains the correct volume entries.</para><para>The root (<filename>/</filename>)
entry in the <filename>/etc/vfstab</filename> file should appear as follows
so that the entry for the file system correctly references the RAID-1 volume:</para><programlisting width="100" role="complete">#device           device              mount    FS      fsck   mount   mount
#to mount         to fsck             point    type    pass   at boot options
#
/dev/md/dsk/d0    /dev/md/rdsk/d0     /        ufs     1      no      -
/dev/dsk/c0t3d0s1 -                   -        swap    -      no      -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6  /usr     ufs     2      no      -
#
/proc             -                  /proc     proc    -      no      -
swap              -                  /tmp      tmpfs   -      yes     -</programlisting>
</step><step id="troubleshoottasks-step-37"><para>Reboot the system.</para><para>The
system returns to normal operation.</para>
</step>
</procedure>
</task><task id="troubleshoottasks-21051"><title>How to Recover From a Boot Device
Failure</title><tasksummary><para>If you have a root (<filename>/</filename>) mirror and your boot device
fails, you need to set up an alternate boot device.</para><para>The high-level steps in this task are as follows:</para><itemizedlist><listitem><para>Booting from the alternate root (<filename>/</filename>) submirror</para>
</listitem><listitem><para>Determining the erred state database replicas and volumes</para>
</listitem><listitem><para>Repairing the failed disk</para>
</listitem><listitem><para>Restoring state database replicas and volumes to their original
state</para>
</listitem>
</itemizedlist><para>Initially, when the boot device fails, you'll see a message similar
to the following. This message might differ among various architectures.</para><screen width="100">Rebooting with command:
Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0   
The selected SCSI device is not responding
Can't open boot device
...</screen><para>When you see this message, note the device. Then, follow these steps:</para>
</tasksummary><procedure><step id="troubleshoottasks-step-49"><para>Boot from another root (<filename>/</filename>)
submirror.</para><para>Since only two of the six state database replicas in
this example are in error, you can still boot. If this were not the case,
you would need to delete the inaccessible state database replicas in single-user
mode. This procedure is described in <olink targetptr="troubleshoottasks-31036" remap="internal">How
to Recover From Insufficient State Database Replicas</olink>.</para><para>When
you created the mirror for the root (<filename>/</filename>) file system,
you should have recorded the alternate boot device as part of that procedure.
In this example, <literal>disk2</literal> is that alternate boot device.</para><screen>ok <userinput>boot disk2</userinput>
SunOS Release 5.9 Version s81_51 64-bit
Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved.
Hostname: demo
...
demo console login: <userinput>root</userinput>
Password: &lt;<replaceable>root-password</replaceable>>
Dec 16 12:22:09 host1 login: ROOT LOGIN /dev/console
Last login: Wed Dec 12 10:55:16 on console
Sun Microsystems Inc.   SunOS 5.9       s81_51  May 2002
...</screen>
</step><step id="troubleshoottasks-step-50"><para>Determine how many state database
replicas have failed by using the <command>metadb</command> command.</para><screen># <userinput>metadb</userinput>
       flags         first blk    block count
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
    a    p  luo      16           1034         /dev/dsk/c0t1d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t1d0s3</screen><para>In this example, the system can no longer detect state database replicas
on slice <filename>/dev/dsk/c0t3d0s3</filename>, which is part of the failed
disk.</para>
</step><step id="troubleshoottasks-step-51"><para>Determine that half of the root
(<filename>/</filename>), <filename>swap</filename>, and <filename>/usr</filename> mirrors
have failed by using the <command>metastat</command> command.</para><screen># <userinput>metastat</userinput>
d0: Mirror
    Submirror 0: d10
      State: Needs maintenance
    Submirror 1: d20
      State: Okay
...
 
d10: Submirror of d0
    State: Needs maintenance
    Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 &lt;new device>"
    Size: 47628 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s0          0     No    Maintenance 
 
d20: Submirror of d0
    State: Okay
    Size: 47628 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s0          0     No    Okay  
 
d1: Mirror
    Submirror 0: d11
      State: Needs maintenance
    Submirror 1: d21
      State: Okay
...
 
d11: Submirror of d1
    State: Needs maintenance
    Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 &lt;new device>"
    Size: 69660 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s1          0     No    Maintenance 
 
d21: Submirror of d1
    State: Okay
    Size: 69660 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s1          0     No    Okay        
 
d2: Mirror
    Submirror 0: d12
      State: Needs maintenance
    Submirror 1: d22
      State: Okay
...
 
d12: Submirror of d2
    State: Needs maintenance
    Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 &lt;new device>"
    Size: 286740 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s6          0     No    Maintenance 
 
 
d22: Submirror of d2
    State: Okay
    Size: 286740 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s6          0     No    Okay  </screen><para>In this example, the <command>metastat</command> command shows that
the following submirrors need maintenance:</para><itemizedlist><listitem><para>Submirror <filename>d10</filename>, device <filename>c0t3d0s0</filename></para>
</listitem><listitem><para>Submirror <filename>d11</filename>, device <filename>c0t3d0s1</filename></para>
</listitem><listitem><para>Submirror <filename>d12</filename>, device <filename>c0t3d0s6</filename></para>
</listitem>
</itemizedlist>
</step><step id="troubleshoottasks-step-52"><para>Halt the system, replace the disk. Use the <command>format</command> command
or the <command>fmthard</command> command, to partition the disk as it was
before the failure.  </para><tip><para>If the new disk is identical to the existing disk (the intact side
of the mirror, in this example), quickly format the new disk. To do so, use
the <command>prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t3d0s2</command> command
(<literal>c0t3d0</literal>, in this example).</para>
</tip><screen># <userinput>halt</userinput>
...
Halted
...
ok <userinput>boot</userinput>
...
# <userinput>format /dev/rdsk/c0t3d0s0</userinput></screen>
</step><step id="troubleshoottasks-step-53"><para>Reboot the system.</para><para>Note
that you must reboot from the other half of the root (<filename>/</filename>)
mirror. You should have recorded the alternate boot device when you created
the mirror.</para><screen># <userinput>halt</userinput>
...
ok <userinput>boot disk2</userinput></screen>
</step><step id="troubleshoottasks-step-54"><para>To delete the failed state database
replicas and then add them back, use the <command>metadb</command> command. </para><screen># <userinput>metadb</userinput>
       flags         first blk    block count
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
    a    p  luo      16           1034         /dev/dsk/c0t1d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t1d0s3
# <userinput>metadb -d c0t3d0s3</userinput>
# <userinput>metadb -c 2 -a c0t3d0s3</userinput>
# <userinput>metadb</userinput>
       flags         first blk    block count
     a m  p  luo     16           1034         /dev/dsk/c0t2d0s3
     a    p  luo     1050         1034         /dev/dsk/c0t2d0s3
     a    p  luo     16           1034         /dev/dsk/c0t1d0s3
     a    p  luo     1050         1034         /dev/dsk/c0t1d0s3
     a        u      16           1034         /dev/dsk/c0t3d0s3
     a        u      1050         1034         /dev/dsk/c0t3d0s3</screen>
</step><step id="troubleshoottasks-step-55"><para>Re-enable the submirrors by using the <command>metareplace</command> command.</para><screen># <userinput>metareplace -e d0 c0t3d0s0</userinput>
Device /dev/dsk/c0t3d0s0 is enabled
 
# <userinput>metareplace -e d1 c0t3d0s1</userinput>
Device /dev/dsk/c0t3d0s1 is enabled
 
# <userinput>metareplace -e d2 c0t3d0s6</userinput>
Device /dev/dsk/c0t3d0s6 is enabled</screen><para>After some time, the resynchronization will complete. You can now return
to booting from the original device.</para>
</step>
</procedure>
</task>
</sect1><sect1 id="tasks-state-db-replicas-11"><title>Recovering From State Database
Replica Failures</title><para>If the state database replica quorum is not met, for example, due to
a drive failure, the system cannot be rebooted into multiuser mode. This situation
could follow a panic when Solaris Volume Manager discovers that fewer than half of
the state database replicas are available. This situation could also occur
if the system is rebooted with exactly half or fewer functional state database
replicas. In Solaris Volume Manager terminology, the state database has gone &ldquo;stale.&rdquo;
This procedure explains how to recover from this problem.</para><task id="troubleshoottasks-31036"><title>How to Recover From Insufficient
State Database Replicas</title><procedure><step id="troubleshoottasks-step-40"><para>Boot the system.</para>
</step><step id="tasks-state-db-replicas-step-23"><para>Determine which state database
replicas are unavailable.</para><screen># <userinput>metadb -i</userinput></screen>
</step><step id="troubleshoottasks-step-42"><para>If one or more disks are known
to be unavailable, delete the state database replicas on those disks. Otherwise,
delete enough erred state database replicas (W, M, D, F, or R status flags
reported by <command>metadb</command>) to ensure that a majority of the existing
state database replicas are not erred.</para><screen># <userinput>metadb -d <replaceable>disk-slice</replaceable></userinput></screen><tip><para>State database replicas with a capitalized status flag are in error.
State database replicas with a lowercase status flag are functioning normally.</para>
</tip>
</step><step id="troubleshoottasks-step-43"><para>Verify that the replicas have been
deleted.</para><screen># <userinput>metadb</userinput></screen>
</step><step id="troubleshoottasks-step-44"><para>Reboot the system.</para><screen># <userinput>reboot</userinput></screen>
</step><step id="troubleshoottasks-step-45"><para>If necessary, replace the disk,
format it appropriately, then add any state database replicas that are needed
to the disk.</para><para>Follow the instructions in <olink targetptr="tasks-state-db-replicas-9" remap="internal">Creating State Database Replicas</olink>.</para><para>Once you have a replacement disk, halt the system, replace the
failed disk, and once again, reboot the system. Use the <command>format</command> command
or the <command>fmthard</command> command to partition the disk as it was
configured before the failure.  </para>
</step>
</procedure><example id="egjyh"><title>Recovering From Stale State Database Replicas</title><para>In the following example, a disk that contains seven replicas has gone
bad. As a result, the system has only three good replicas. The system panics,
then cannot reboot into multiuser mode.</para><screen width="100">panic[cpu0]/thread=70a41e00: md: state database problem

403238a8 md:mddb_commitrec_wrapper+6c (2, 1, 70a66ca0, 40323964, 70a66ca0, 3c)
  %l0-7: 0000000a 00000000 00000001 70bbcce0 70bbcd04 70995400 00000002 00000000
40323908 md:alloc_entry+c4 (70b00844, 1, 9, 0, 403239e4, ff00)
  %l0-7: 70b796a4 00000001 00000000 705064cc 70a66ca0 00000002 00000024 00000000
40323968 md:md_setdevname+2d4 (7003b988, 6, 0, 63, 70a71618, 10)
  %l0-7: 70a71620 00000000 705064cc 70b00844 00000010 00000000 00000000 00000000
403239f8 md:setnm_ioctl+134 (7003b968, 100003, 64, 0, 0, ffbffc00)
  %l0-7: 7003b988 00000000 70a71618 00000000 00000000 000225f0 00000000 00000000
40323a58 md:md_base_ioctl+9b4 (157ffff, 5605, ffbffa3c, 100003, 40323ba8, ff1b5470)
  %l0-7: ff3f2208 ff3f2138 ff3f26a0 00000000 00000000 00000064 ff1396e9 00000000
40323ad0 md:md_admin_ioctl+24 (157ffff, 5605, ffbffa3c, 100003, 40323ba8, 0)
  %l0-7: 00005605 ffbffa3c 00100003 0157ffff 0aa64245 00000000 7efefeff 81010100
40323b48 md:mdioctl+e4 (157ffff, 5605, ffbffa3c, 100003, 7016db60, 40323c7c)
  %l0-7: 0157ffff 00005605 ffbffa3c 00100003 0003ffff 70995598 70995570 0147c800
40323bb0 genunix:ioctl+1dc (3, 5605, ffbffa3c, fffffff8, ffffffe0, ffbffa65)
  %l0-7: 0114c57c 70937428 ff3f26a0 00000000 00000001 ff3b10d4 0aa64245 00000000

panic: 
stopped at      edd000d8:       ta      %icc,%g0 + 125
Type  'go' to resume

ok<userinput> boot -s</userinput>
Resetting ... 

Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard
OpenBoot 3.11, 128 MB memory installed, Serial #9841776.
Ethernet address 8:0:20:96:2c:70, Host ID: 80962c70.



Rebooting with command: boot -s                                       
Boot device: /pci@1f,0/pci@1,1/ide@3/disk@0,0:a  File and args: -s
SunOS Release 5.9 Version s81_39 64-bit

Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved.
configuring IPv4 interfaces: hme0.
Hostname: dodo

metainit: dodo: stale databases

Insufficient metadevice database replicas located.

Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.

Type control-d to proceed with normal startup,
(or give root password for system maintenance): <userinput><replaceable>root-password</replaceable></userinput>
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

Jun  7 08:57:25 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.9       s81_39  May 2002
# <userinput>metadb -i</userinput>
        flags           first blk       block count
     a m  p  lu         16              8192            /dev/dsk/c0t0d0s7
     a    p  l          8208            8192            /dev/dsk/c0t0d0s7
     a    p  l          16400           8192            /dev/dsk/c0t0d0s7
    M     p             16              unknown         /dev/dsk/c1t1d0s0
    M     p             8208            unknown         /dev/dsk/c1t1d0s0
    M     p             16400           unknown         /dev/dsk/c1t1d0s0
    M     p             24592           unknown         /dev/dsk/c1t1d0s0
    M     p             32784           unknown         /dev/dsk/c1t1d0s0
    M     p             40976           unknown         /dev/dsk/c1t1d0s0
    M     p             49168           unknown         /dev/dsk/c1t1d0s0
# <userinput>metadb -d c1t1d0s0</userinput>
# <userinput>metadb</userinput>
        flags           first blk       block count
     a m  p  lu         16              8192            /dev/dsk/c0t0d0s7
     a    p  l          8208            8192            /dev/dsk/c0t0d0s7
     a    p  l          16400           8192            /dev/dsk/c0t0d0s7
#  </screen><para>The system panicked because it could no longer detect state database
replicas on slice <filename>/dev/dsk/c1t1d0s0</filename>. This slice is part
of the failed disk or is attached to a failed controller. The first <command>metadb <option>i</option></command> command identifies the replicas on this slice as having
a problem with the master blocks.</para><para>When you delete the stale state database replicas, the root (<filename>/</filename>)
file system is read-only. You can ignore the <filename>mddb.cf</filename> error
messages that are displayed.</para><para>At this point, the system is again functional, although it probably
has fewer state database replicas than it should. Any volumes that used part
of the failed storage are also either failed, erred, or hot-spared. Those
issues should be addressed promptly. </para>
</example>
</task>
</sect1><sect1 id="tasks-softpart-26"><title>Recovering From Soft Partition Problems</title><para>This section shows how to recover configuration information for soft
partitions. You should only use the following procedure if all of your state
database replicas have been lost and you do not have one of the following:</para><itemizedlist><listitem><para>A current or accurate copy of <command>metastat -p</command> output</para>
</listitem><listitem><para>A current or accurate copy of the <filename>md.cf</filename> file</para>
</listitem><listitem><para>An up-to-date <filename>md.tab</filename> file</para>
</listitem>
</itemizedlist><task id="tasks-softpart-27"><title>How to Recover Configuration Data for
a Soft Partition</title><tasksummary><para>At the beginning of each soft partition extent, a sector is used
to mark the beginning of the soft partition extent. These  hidden sectors
are called <emphasis>extent headers</emphasis>. These headers do not appear
to the user of the soft partition. If all Solaris Volume Manager configuration data
is lost, the disk can be scanned in an attempt to generate the configuration
data.  </para><para>This procedure is a last option to recover lost soft partition configuration
information. The <command>metarecover</command> command should only be used
when you have lost both your <filename>metadb</filename> and <filename>md.cf</filename> files,
and your <filename>md.tab</filename> file is lost or out of date. </para><note><para>This procedure only works to recover soft partition information.
This procedure does not assist in recovering from other lost configurations
or for recovering configuration information for other Solaris Volume Manager volumes. </para>
</note><note><para>If your configuration included other Solaris Volume Manager volumes that
were built on top of soft partitions, you should recover the soft partitions
before attempting to recover the other volumes. </para>
</note><para>Configuration information about your soft partitions is stored on your
devices and in your state database. Since either source could be corrupt,
you must indicate to the <command>metarecover</command> command which source
is reliable. </para><para>First, use the <command>metarecover</command> command to determine whether
the two sources agree. If they do agree, the <command>metarecover</command> command
cannot be used to make any changes. However, if the <command>metarecover</command> command
reports an inconsistency, you must examine its output carefully to determine
whether the disk or the state database is corrupt. Then, you should use the <command>metarecover</command> command to rebuild the configuration based on the appropriate
source.</para>
</tasksummary><procedure><step id="tasks-softpart-step-46"><para>Read the <olink targetptr="about-softpart-8" remap="internal">Configuration Guidelines for Soft Partitions</olink>.</para>
</step><step id="tasks-softpart-step-47"><para>Review the soft partition recovery
information by using the <command>metarecover</command> command.</para><screen># <userinput>metarecover</userinput> <replaceable>component</replaceable>-p <option>d</option> </screen><variablelist><varlistentry><term><replaceable>component</replaceable></term><listitem><para>Specifies the <filename>c<replaceable>n</replaceable>t<replaceable>n</replaceable>d<replaceable>n</replaceable>s<replaceable>n</replaceable></filename>name
of the raw component</para>
</listitem>
</varlistentry><varlistentry><term><option>p</option></term><listitem><para>Specifies to regenerate soft partitions</para>
</listitem>
</varlistentry><varlistentry><term><option>d</option></term><listitem><para>Specifies to scan the physical slice for extent headers of
soft partitions</para>
</listitem>
</varlistentry>
</variablelist>
</step>
</procedure><example id="metarecover-example-01"><title>Recovering Soft Partitions from On-Disk Extent Headers</title><screen># <userinput>metarecover c1t1d0s1 -p -d</userinput>
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
# <userinput>metarecover c1t1d0s1 -p -d</userinput>
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
WARNING: You are about to add one or more soft partition
metadevices to your metadevice configuration.  If there
appears to be an error in the soft partition(s) displayed
above, do NOT proceed with this recovery operation.
Are you sure you want to do this (yes/no)?<userinput>yes</userinput>
c1t1d0s1: Soft Partitions recovered from device.
bash-2.05# metastat
d10: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                        1                    10240

d11: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    10242                    10240

d12: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    20483                    10240</screen><para>In this example, three soft partitions are recovered from disk, after
the state database replicas were accidentally deleted.</para>
</example>
</task>
</sect1><sect1 id="troubleshoottasks-84"><title>Recovering Storage From a Different
System</title><para>You can recover a Solaris Volume Manager configuration, even onto a different
system from the original system. </para><task id="eqqcz"><title>How to Recover Storage From a Local Disk Set</title><tasksummary><para>If you experience a system failure, you can attach the storage to a
different system and recover the complete configuration from the local disk
set. For example, assume you have a system with an external disk pack of six
disks in it and a Solaris Volume Manager configuration, including at least one state
database replica, on some of those disks. If you have a system failure, you
can physically move the disk pack to a new system and enable the new system
to recognize the configuration. This procedure describes how to move the disks
to another system and recover the configuration from a local disk set.</para><note><para>This recovery procedure works only with Solaris 9, and later,
Solaris Volume Manager volumes.</para>
</note>
</tasksummary><procedure><step id="troubleshoottasks-step-111"><para>Attach the disk or disks that
contain the Solaris Volume Manager configuration to a system with no preexisting Solaris Volume Manager configuration. </para>
</step><step id="troubleshoottasks-step-87"><para>Do a reconfiguration reboot to
ensure that the system recognizes the newly added disks.</para><screen># <userinput>reboot -- -r</userinput></screen>
</step><step id="troubleshoottasks-step-88"><para>Determine the major/minor number
for a slice containing a state database replica on the newly added disks. </para><para>Use <command>ls -lL</command>, and note the two numbers between the
group name and the date. These numbers are the major/minor numbers for this
slice. </para><screen># ls -Ll /dev/dsk/c1t9d0s7
brw-r-----   1 root     sys       32, 71 Dec  5 10:05 /dev/dsk/c1t9d0s7</screen>
</step><step id="troubleshoottasks-step-94"><para>If necessary, determine the major
name corresponding with the major number by looking up the major number in <filename>/etc/name_to_major</filename>.</para><screen># <userinput>grep " 32" /etc/name_to_major  sd 32</userinput></screen>
</step><step id="troubleshoottasks-step-89"><para>Update the <filename>/kernel/drv/md.conf</filename> file with the information that instructs Solaris Volume Manager where
to find a valid state database replica on the new disks.</para><para>For example,
in the line that begins with <literal>mddb_bootlist1</literal>, replace the <literal>sd</literal> with the major name you found in step 4. Replace <replaceable>71</replaceable> in
the example with the minor number you identified in <olink targetptr="troubleshoottasks-step-88" remap="internal">Step&nbsp;3</olink>.</para><screen>#pragma ident   "@(#)md.conf    2.2     04/04/02 SMI"
#
# Copyright 2004 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# The parameters nmd and md_nsets are obsolete.  The values for these
# parameters no longer have any meaning.
name="md" parent="pseudo" nmd=128 md_nsets=4;

# Begin MDD database info (do not edit)
<userinput>mddb_bootlist1="</userinput><systemitem>sd</systemitem><userinput>:</userinput><systemitem>71</systemitem><userinput>:16:id0";</userinput>
# End MDD database info (do not edit)</screen>
</step><step id="troubleshoottasks-step-90"><para>Reboot to force Solaris Volume Manager to
reload your configuration. </para><para>You will see messages similar to the
following displayed on the console. </para><screen>volume management starting.
Dec  5 10:11:53 host1 metadevadm: Disk movement detected
Dec  5 10:11:53 host1 metadevadm: Updating device names in 
Solaris Volume Manager
The system is ready.</screen>
</step><step id="troubleshoottasks-step-91"><para>Verify your configuration. Use
the <command>metadb</command> command to verify the status of the state database
replicas. and <command>metastat</command> command view the status for each
volume.</para><screen># <userinput>metadb</userinput>
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t9d0s7
     a       luo        16              8192            /dev/dsk/c1t10d0s7
     a       luo        16              8192            /dev/dsk/c1t11d0s7
     a       luo        16              8192            /dev/dsk/c1t12d0s7
     a       luo        16              8192            /dev/dsk/c1t13d0s7
# <userinput>metastat</userinput>
d12: RAID
    State: Okay         
    Interlace: 32 blocks
    Size: 125685 blocks
Original device:
    Size: 128576 blocks
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t11d0s3                330     No    Okay         Yes    
        c1t12d0s3                330     No    Okay         Yes    
        c1t13d0s3                330     No    Okay         Yes    

d20: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                     3592                     8192

d21: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                    11785                     8192

d22: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                    19978                     8192

d10: Mirror
    Submirror 0: d0
      State: Okay         
    Submirror 1: d1
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 82593 blocks

d0: Submirror of d10
    State: Okay         
    Size: 118503 blocks
    Stripe 0: (interlace: 32 blocks)
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t9d0s0                   0     No    Okay         Yes    
        c1t10d0s0               3591     No    Okay         Yes    


d1: Submirror of d10
    State: Okay         
    Size: 82593 blocks
    Stripe 0: (interlace: 32 blocks)
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t9d0s1                   0     No    Okay         Yes    
        c1t10d0s1                  0     No    Okay         Yes    


Device Relocation Information:
Device       Reloc    Device ID
c1t9d0       Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ
c1t10d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q
c1t11d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ
c1t12d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J
c1t13d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0
# 
# metadb         
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t9d0s7
     a       luo        16              8192            /dev/dsk/c1t10d0s7
     a       luo        16              8192            /dev/dsk/c1t11d0s7
     a       luo        16              8192            /dev/dsk/c1t12d0s7
     a       luo        16              8192            /dev/dsk/c1t13d0s7
# metastat 
d12: RAID
    State: Okay         
    Interlace: 32 blocks
    Size: 125685 blocks
Original device:
    Size: 128576 blocks
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t11d0s3                330     No    Okay         Yes    
        c1t12d0s3                330     No    Okay         Yes    
        c1t13d0s3                330     No    Okay         Yes    

d20: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                     3592                     8192

d21: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                    11785                     8192

d22: Soft Partition
    Device: d10
    State: Okay
    Size: 8192 blocks
        Extent              Start Block              Block count
             0                    19978                     8192

d10: Mirror
    Submirror 0: d0
      State: Okay         
    Submirror 1: d1
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 82593 blocks

d0: Submirror of d10
    State: Okay         
    Size: 118503 blocks
    Stripe 0: (interlace: 32 blocks)
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t9d0s0                   0     No    Okay         Yes    
        c1t10d0s0               3591     No    Okay         Yes    


d1: Submirror of d10
    State: Okay         
    Size: 82593 blocks
    Stripe 0: (interlace: 32 blocks)
        Device              Start Block  Dbase State        Reloc  Hot Spare
        c1t9d0s1                   0     No    Okay         Yes    
        c1t10d0s1                  0     No    Okay         Yes    


Device Relocation Information:
Device         Reloc    Device ID
c1t9d0         Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ1
c1t10d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q
c1t11d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ
c1t12d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J
c1t13d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0
# <userinput>metastat -p</userinput>
d12 -r c1t11d0s3 c1t12d0s3 c1t13d0s3 -k -i 32b
d20 -p d10 -o 3592 -b 8192 
d21 -p d10 -o 11785 -b 8192 
d22 -p d10 -o 19978 -b 8192 
d10 -m d0 d1 1
d0 1 2 c1t9d0s0 c1t10d0s0 -i 32b
d1 1 2 c1t9d0s1 c1t10d0s1 -i 32b
#</screen>
</step>
</procedure>
</task><sect2 id="egkaa"><title>Recovering Storage From a Known Disk Set</title><para>The introduction of device ID support for disk sets in Solaris Volume
Manager allows you to recover storage from known disk sets and to import the
disk set to a different system. The <command>metaimport</command> command
allows you to  import known disk sets  from one system to another system.
Both systems must contain existing Solaris Volume Manager configurations that
include device ID support. For more information on device ID support, see <olink targetptr="egjyq" remap="internal">Asynchronous Shared Storage in Disk Sets</olink>. For more
information on the <command>metaimport</command> command, see the <olink targetdoc="refman1m" targetptr="metaimport-1m" remap="external"><citerefentry><refentrytitle>metaimport</refentrytitle><manvolnum>1M</manvolnum></citerefentry></olink> man page.</para><task id="eoqsb"><title>How to Print a Report on Disk Sets Available for Import</title><procedure><step><para>Become superuser.</para>
</step><step><para>Obtain a report on disk sets available for import.</para><screen># <command>metaimport <option>r</option> <option>v</option></command></screen><variablelist><varlistentry><term><option>r</option></term><listitem><para>Provides a report of the unconfigured disk sets available
for import on the system.</para>
</listitem>
</varlistentry><varlistentry><term><option>v</option></term><listitem><para>Provides detailed information about the state database replica
location and status on the disks of unconfigured disk sets available for import
on the system.</para>
</listitem>
</varlistentry>
</variablelist>
</step>
</procedure><example id="eoqrx"><title>Reporting on Disk Sets Available for Import</title><para>The following examples show how to print a report on disk sets available
for import.</para><screen># <userinput>metaimport -r</userinput>
 Drives in regular diskset including disk c1t2d0:
   c1t2d0
   c1t3d0
 More info:
   metaimport -r -v c1t2d0 
Import:   metaimport -s &lt;newsetname> c1t2d0 
 Drives in replicated diskset including disk c1t4d0:
   c1t4d0
   c1t5d0
 More info:
   metaimport -r -v c1t4d0 
Import:   metaimport -s &lt;newsetname> c1t4d0

# <userinput>metaimport -r -v c1t2d0</userinput>
Import: metaimport -s &lt;newsetname> c1t2d0
Last update: Mon Dec 29 14:13:35 2003
Device       offset       length replica flags
c1t2d0           16         8192      a        u     
c1t3d0           16         8192      a        u     
c1t8d0           16         8192      a        u     </screen>
</example>
</task><task id="eoqry"><title>How to Import a Disk Set From One System to Another
System</title><procedure><step><para>Become superuser.</para>
</step><step><para>Verify that a disk set is available for import .</para><screen># <command>metaimport <option>r</option> <option>v</option></command></screen>
</step><step><para>Import an available disk set.</para><screen># <command>metaimport <option>s</option> <replaceable>diskset-name</replaceable> <replaceable>drive-name</replaceable></command></screen><variablelist><varlistentry><term><option> s</option> <replaceable>diskset-name</replaceable></term><listitem><para>Specifies the name of the disk set being created.</para>
</listitem>
</varlistentry><varlistentry><term><replaceable>drive-name</replaceable></term><listitem><para>Identifies a disk (<literal>c#t#d#</literal>) containing a
state database replica from the disk set being imported.</para>
</listitem>
</varlistentry>
</variablelist>
</step><step><para>Verify that the disk set has been imported.</para><screen># <command>metaset</command> <option>s</option> <replaceable>diskset-name</replaceable></screen>
</step>
</procedure><example id="eoqrz"><title>Importing a Disk Set</title><para>The following example shows how to import a disk set.</para><screen># <userinput>metaimport -s red c1t2d0</userinput>
Drives in diskset including disk c1t2d0:
  c1t2d0
  c1t3d0
  c1t8d0
# <userinput>metaset -s red</userinput>


Set name = red, Set number = 1

Host                Owner
  host1            Yes

Drive    Dbase

c1t2d0   Yes  

c1t3d0   Yes  

c1t8d0   Yes  </screen>
</example>
</task>
</sect2>
</sect1><sect1 id="egcpz"><title>Recovering From Disk Set Problems</title><para>The following sections detail how to recover from specific disk set
related problems. </para><sect2 id="egcqb"><title>What to Do When You Cannot Take Ownership of A Disk
Set</title><para>In cases in which you cannot take ownership of a disk set from any node
(perhaps as a result of a system failure, disk failure, or communication link
failure), and therefore cannot delete the disk set record, it is possible
to purge the disk set from the Solaris Volume Manager state database replica records
on the current host. </para><para>Purging the disk set records does not affect the state database information
contained in the disk set, so the disk set could later be imported (with the <command>metaimport</command> command, described at <olink targetptr="efnri" remap="internal">Importing
Disk Sets</olink>).</para><para>If you need to purge a disk set from a Sun Cluster configuration, use
the following procedure, but use the <option>C</option> option instead of
the <option>P</option> option you use when no Sun Cluster configuration is
present.</para><task id="egjzf"><title>How to Purge a Disk Set</title><procedure><step><para>Attempt to take the disk set with the <command>metaset</command> command.</para><screen># <userinput>metaset -s <replaceable>setname</replaceable> -t -f</userinput></screen><para>This command will attempt to take (<option>t</option>) the disk set
named <replaceable>setname</replaceable> forcibly (<option>f</option>). If
the set can be taken, this command will succeed. If the set is owned by another
host when this command runs, the other host will panic to avoid data corruption
or loss. If this command succeeds, you can delete the disk set cleanly, without
the need to purge the set. </para><para>If it is not possible to take the
set, you may purge ownership records.</para>
</step><step><para>Use the <command>metaset</command> command with the <option>P</option> to
purge the disk set from the current host.</para><screen># <userinput>metaset -s <replaceable>setname</replaceable> -P</userinput></screen><para>This command will purge (<option>P</option>) the disk set named <replaceable>setname</replaceable> from the host on which the command is run. </para>
</step><step><para>Use the <command>metaset</command> command to verify that the
set has been purged.</para><screen># <userinput>metaset</userinput></screen>
</step>
</procedure><example id="egkdu"><title>Purging a Disk Set</title><screen>host1# metaset -s red -t -f
metaset: host1: setname "red": no such set</screen><screen>host2# metaset

Set name = red, Set number = 1

Host                Owner
  host2        

Drive    Dbase

c1t2d0   Yes  

c1t3d0   Yes  

c1t8d0   Yes  

host2# metaset -s red -P
host2# metaset</screen>
</example><taskrelated role="see-also"><itemizedlist><listitem><para><olink targetptr="about-disksets-31856" remap="internal">Chapter&nbsp;18, Disk
Sets (Overview)</olink>, for conceptual information about disk sets.</para>
</listitem><listitem><para><olink targetptr="tasks-disksets-1" remap="internal">Chapter&nbsp;19, Disk
Sets (Tasks)</olink>, for information about tasks associated with disk sets.</para>
</listitem>
</itemizedlist>
</taskrelated>
</task>
</sect2>
</sect1><sect1 id="exlvv"><title>Performing Mounted Filesystem Backups Using the <command>ufsdump</command> Command</title><para>The following procedure describes how to increase the performance of
the <command>ufsdump</command> command when you use it to backup a mounted
filesystem located on a RAID-1 volume.</para><task id="exlvw"><title>How to Perform a Backup of a Mounted Filesystem Located
on a RAID-1 Volume</title><tasksummary><para>You can use the <command>ufsdump</command> command to backup the files
of a mounted filesystem residing on a RAID-1 volume. Set the read policy on
the volume to "first" when the backup utility is <command>ufsdump</command>.
This improves the rate at which the backup is performed.</para>
</tasksummary><procedure><step><para>Become superuser.</para>
</step><step><para>Run the <command>metastat</command> command to make sure the mirror
is in the &ldquo;Okay&rdquo; state.</para><screen># <userinput>metastat d40</userinput>
d40: Mirror
    Submirror 0: d41
      State: Okay
    Submirror 1: d42
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 20484288 blocks (9.8 GB)</screen><para>A mirror that is in the &ldquo;Maintenance&rdquo; state should be repaired
first.</para>
</step><step><para>Set the read policy on the mirror to &ldquo;first.&rdquo;</para><screen># <userinput>metaparam -r first d40</userinput>
# <userinput>metastat d40</userinput>
d40: Mirror
    Submirror 0: d41
      State: Okay
    Submirror 1: d42
      State: Okay
    Pass: 1
    Read option: first
    Write option: parallel (default)
    Size: 20484288 blocks (9.8 GB)</screen>
</step><step><para>Perform a backup the filesystem.</para><screen># <userinput>ufsdump 0f /dev/backup /opt/test</userinput></screen>
</step><step><para>After the <command>ufsdump</command> command is done, set the
read policy on the mirror to &ldquo;roundrobin.&rdquo;</para><screen># <userinput>metaparam -r roundrobin d40</userinput>
# <userinput>metastat d40</userinput>
d40: Mirror
    Submirror 0: d41
      State: Okay
    Submirror 1: d42
      State: Okay
    Pass: 1
    Read option: roundrobin
    Write option: parallel (default)
    Size: 20484288 blocks (9.8 GB)</screen>
</step>
</procedure>
</task>
</sect1><sect1 id="fsujr"><title>Performing System Recovery</title><para>Sometimes it is useful to boot from a Solaris OS install image on DVD
or CD media to perform a system recovery. Resetting the <literal>root</literal> password
is one example of when using the install image is useful.</para><para>If you are using a Solaris Volume Manager configuration, then you want to mount
the Solaris Volume Manager volumes instead of the underlying disks. This step is especially
important if the root (<filename>/</filename>) file system is mirrored. Because Solaris Volume Manager is
part of the Solaris OS, mounting the Solaris Volume Manager volumes ensures that any
changes are reflected on both sides of the mirror.</para><para>Use the following procedure to make the Solaris Volume Manager volumes accessible
from a Solaris OS DVD or CD-ROM install image.</para><task id="fsujq"><title>How to Recover a System Using a Solaris Volume Manager Configuration</title><tasksummary><para>Boot your system from the Solaris OS installation DVD or CD media. Perform
this procedure from the <literal>root</literal> prompt of the Solaris miniroot.</para>
</tasksummary><procedure><step><para>Mount as read only the underlying disk containing the Solaris Volume Manager configuration.</para><screen># <userinput>mount -o ro /dev/dsk/c0t0d0s0 /a</userinput></screen>
</step><step><para>Copy the <filename>md.conf</filename> file into <filename>/kernel/drv</filename> directory.</para><screen># <userinput>cp /a/kernel/drv/md.conf /kernel/drv/md.conf</userinput></screen>
</step><step><para>Unmount the file system from the miniroot.</para><screen># <userinput>umount /a</userinput></screen>
</step><step><para>Update the Solaris Volume Manager driver to load the configuration. Ignore
any warning messages printed by the <command>update_drv</command> command.</para><screen># <userinput>update_drv -f md</userinput></screen>
</step><step><para>Configure the system volumes.</para><screen># <userinput>metainit -r</userinput></screen>
</step><step><para>If you have RAID-1 volumes in the Solaris Volume Manager configuration,
resynchronize them.</para><screen># <userinput>metasync <replaceable>mirror-name</replaceable></userinput></screen>
</step><step><para>Solaris Volume Manager volumes should be accessible using the <command>mount</command> command.</para><screen># <userinput>mount /dev/md/dsk/<replaceable>volume-name</replaceable> /a</userinput></screen>
</step>
</procedure><example id="fsujv"><title>Recovering a System Using a Solaris Volume Manager Configuration</title><screen># <userinput>mount -o ro /dev/dsk/c0t0d0s0 /a</userinput>
# <userinput>cp /a/kernel/drv/md.conf /kernel/drv/md.conf</userinput>
# <userinput>umount /a</userinput>
# <userinput>update_drv -f md</userinput>
Cannot unload module: md
Will be unloaded upon reboot.
Forcing update of md.conf.
devfsadm: mkdir fialed for /dev 0xled: Read-only file system
devfsadm: inst_sync failed for /etc/path_to_inst.1359: Read-only file system
devfsadm: WARNING: failed to update /etc/path_to_inst
# <userinput>metainit -r</userinput>
# <userinput>metasync d0</userinput>
# <userinput>mount /dev/md/dsk/d0 /a</userinput></screen>
</example>
</task>
</sect1>
</chapter>