Fix Sanoid Pruning Issues: A Comprehensive Guide
Having trouble with Sanoid pruning? You're not alone! Many users encounter issues where Sanoid, the popular ZFS snapshot management tool, fails to properly prune snapshots, leading to disk space issues and frustration. This guide dives deep into the common causes of Sanoid pruning problems and provides step-by-step solutions to get your snapshot management back on track.
Understanding Sanoid Pruning
Before we jump into troubleshooting, let's quickly recap what Sanoid pruning is supposed to do. Sanoid automates the creation and deletion of ZFS snapshots based on configurable retention policies. These policies define how many snapshots to keep on an hourly, daily, weekly, monthly, or yearly basis. Pruning is the process of deleting older snapshots that fall outside these retention parameters. A properly functioning Sanoid setup ensures that you have a consistent and manageable snapshot history without consuming excessive storage space. Think of it like regularly decluttering your digital attic – getting rid of the old stuff to make room for the new!
Sanoid pruning relies on a combination of factors working harmoniously. The configuration file, usually located at /etc/sanoid/sanoid.conf, dictates the retention policies. Sanoid then uses these policies in conjunction with the ZFS snapshot naming convention to identify snapshots eligible for deletion. The sanoid command itself executes the pruning process. Any disruption in this chain can lead to pruning failures. The most important thing to remember is to check your configurations regularly to ensure that they are set up according to your needs. If you don't check configurations regularly, it can cause more storage usage than you can handle. Sanoid also allows you to test configurations to ensure that all configurations are set up correctly. You can simulate the process using dry runs to avoid any accidental data loss.
Common Causes of Sanoid Pruning Issues
Several factors can contribute to Sanoid pruning malfunctions. Here are some of the most frequent culprits:
1. Incorrect Configuration
This is the most common cause. A misconfigured sanoid.conf file can lead to Sanoid either deleting too many snapshots or not deleting any at all. Check the following:
- Retention settings: Ensure that your hourly, daily, weekly, monthly, and yearly retention values are correctly set according to your desired snapshot strategy. For example,
keepmonthly = 12means you want to keep 12 monthly snapshots. If you set it to zero, it can mean that no snapshots are saved, or it could lead to other errors, depending on how it is set up. - Dataset inclusion/exclusion: Verify that the datasets you want Sanoid to manage are properly included and that any datasets you want to exclude are correctly excluded. This is typically done using regular expressions in the
sanoid.conffile. For example, you can adduse = productionfor datasets that you want to be included in the pruning. Datasets that are set touse = replicationcan be excluded from the pruning. - Template assignments: Make sure that your datasets are assigned to the correct templates in
sanoid.conf. Templates define the retention policies that apply to specific datasets. If a dataset isn't assigned to a template, or if it's assigned to the wrong template, pruning won't work as expected. The most important thing is to ensure that the templates are designed to match your needs.
2. Snapshot Naming Conventions
Sanoid relies on a specific naming convention for ZFS snapshots. By default, it expects snapshots to be named in the format sanoid-<period>-<timestamp>, where <period> is the snapshot frequency (e.g., hourly, daily) and <timestamp> is the Unix timestamp of when the snapshot was created. If your snapshots don't adhere to this naming scheme, Sanoid won't be able to identify them for pruning.
Non-Sanoid snapshots can confuse Sanoid. If you manually create snapshots or use other tools that don't follow the Sanoid naming convention, Sanoid might misinterpret them or skip them altogether. It's best to keep snapshot management consistent. For example, if you are using rsync and create snapshots with different names, then you might have issues with pruning. So, it is best to stick to the naming convention that Sanoid provides.
3. ZFS Issues
Underlying ZFS problems can also interfere with Sanoid's ability to prune snapshots. These issues are uncommon but can still happen.
- Pool errors: A corrupted or degraded ZFS pool can prevent Sanoid from accessing or deleting snapshots. Check your pool's health using the
zpool statuscommand. The status command provides information about the pool, such as health, capacity usage, and potential errors. It is the best way to know if your zpool is healthy or if there are issues. - Dataset permissions: Ensure that the user Sanoid runs under has the necessary permissions to delete snapshots on the relevant datasets. Permission problems can arise if you've recently changed ownership or permissions on your ZFS datasets. If the user does not have permissions, then Sanoid will be unable to delete any snapshots.
4. Sanoid Bugs or Version Incompatibilities
While rare, bugs in Sanoid itself or incompatibilities between Sanoid versions and your ZFS version can sometimes cause pruning issues. Keep Sanoid updated to the latest stable version to minimize the risk of encountering such problems.
If all else fails, there might be a bug in the latest version of Sanoid, or the version of Sanoid might be incompatible with your version of ZFS. In this case, it is best to downgrade to an older version to see if the issue is resolved. It is also a good idea to post a bug report in the community to let other people know about the issue.
Troubleshooting Steps
Now that we've covered the common causes, let's walk through the troubleshooting process:
1. Verify Your Configuration
Carefully review your sanoid.conf file. Pay close attention to retention settings, dataset inclusion/exclusion rules, and template assignments. Use a text editor or IDE with syntax highlighting to help you spot any errors. Double-check the configuration settings to make sure they align with your desired snapshot retention strategy.
- Use the
sanoid --testcommand: This command simulates a Sanoid run without actually deleting any snapshots. It will show you which snapshots Sanoid would delete based on your current configuration. This is an invaluable tool for identifying configuration errors before they cause any real damage. Review the output carefully to ensure that Sanoid is targeting the correct snapshots for deletion. If you are unsure what the output means, it is best to consult the community to get help.
2. Check Snapshot Naming
Use the zfs list -t snapshot command to list all snapshots on your system. Examine the snapshot names to ensure they follow the sanoid-<period>-<timestamp> convention. If you find snapshots with different names, consider renaming them or excluding them from Sanoid's management.
- Rename snapshots (with caution): If you have a small number of non-Sanoid snapshots, you can rename them using the
zfs renamecommand. However, be extremely careful when doing this, as renaming snapshots can have unintended consequences, especially if other tools or scripts rely on those snapshots. Before renaming, make sure you understand the impact on your system and back up any critical data. Only rename the snapshots if you are sure it will not cause any issues.
3. Examine Sanoid Logs
Sanoid typically logs its activity to /var/log/sanoid.log. Check this log file for any error messages or warnings that might indicate the cause of the pruning failure. Pay attention to timestamps and look for messages related to snapshot deletion. The logs can tell you what commands were executed and whether they were successful. This will help you pinpoint the exact reason why the pruning failed.
- Increase verbosity: If the default log level doesn't provide enough information, you can increase the verbosity of Sanoid's logging by adding the
-voption to thesanoidcommand. This will produce more detailed output, which can be helpful for debugging. Remember to remove the-voption once you've resolved the issue, as excessive logging can consume disk space.
4. Verify ZFS Pool Health and Permissions
Run the zpool status command to check the health of your ZFS pool. Look for any errors or warnings that might indicate a problem. Additionally, ensure that the user Sanoid runs under has the necessary permissions to delete snapshots on the relevant datasets. The Sanoid user needs to have permissions to modify the snapshots; otherwise, it will not be able to delete them. Check user permissions to ensure that Sanoid has permission.
5. Update or Downgrade Sanoid
Make sure you're running the latest stable version of Sanoid. If you suspect a bug or incompatibility issue, try downgrading to a previous version that you know was working correctly. Refer to the Sanoid documentation for instructions on how to update or downgrade.
6. Test Sanoid Manually
To isolate the issue, try running Sanoid manually with specific options. For example, you can use the sanoid --prune --verbose command to force a pruning run and see if any errors occur. The --verbose option will provide more detailed output to help you diagnose the problem. By running Sanoid manually, you can eliminate any potential interference from cron jobs or other automation.
Example Scenario and Solution
Let's say you have a dataset named tank/data and you expect Sanoid to keep 7 daily snapshots. However, you notice that old snapshots are not being deleted. After checking your sanoid.conf file, you find that the keepdaily value for the template assigned to tank/data is set to 0. Changing this value to 7 and running sanoid --prune resolves the issue.
Conclusion
Troubleshooting Sanoid pruning issues can be a bit tricky, but by systematically checking your configuration, snapshot naming conventions, ZFS health, and Sanoid logs, you can usually pinpoint the cause of the problem. Remember to use the --test command to simulate pruning runs and avoid any accidental data loss. With a little patience and attention to detail, you can get your Sanoid setup working smoothly and ensure that your ZFS snapshots are properly managed. If you are still running into issues, there is a helpful community that can assist you. It is best to provide the logs and any troubleshooting steps that you have already taken.