|
この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。
出典を追加して記事の信頼性向上にご協力ください。(2013年3月) |
|
「Backup」はこの項目へ転送されています。Mac OS X用のソフトウェアについては「Backup (ソフトウェア)」をご覧ください。 |
|
フジテレビ系列でかつて放送されていた深夜のバラエティ番組については「BACK-UP!」を、野球におけるバックアッププレイについては「ベースカバー」をご覧ください。 |
バックアップ (backup) とは、支援や予備のことであり、このうち情報工学におけるデータやシステムのバックアップとは、これらの複製(コピー)を作成し、たとえ問題が起きてもデータを復旧できるように備えておくこと。
本項目では、データやシステムの複製技術とその目的について解説する。
目次
- 1 システム保全
- 2 データ保全
- 2.1 リスクの分類と対処法
- 2.2 バックアップ対象
- 2.3 バックアップの種類
- 2.3.1 フルバックアップの特徴
- 2.3.2 差分バックアップの特徴
- 2.3.3 増分バックアップの特徴
- 2.4 バックアップの運用
- 2.4.1 範囲
- 2.4.2 頻度
- 2.4.3 保存期間
- 2.4.4 保管場所
- 2.5 バックアップメディアの種類
- 3 脚注
- 4 関連項目
システム保全
システムのうち、一部分が故障などで機能しなくなっても利用可能な状態で稼働している(すなわち可用性の確保)ために、そのシステムの重要性に応じてバックアップ構成がとられる(リスクマネジメントを行い、システム停止の頻度(可能性)と損害を洗い出し、かけられるコストを勘案して設計・運用される)。
リスクの分類と対処法
- 装置故障 - ハードウエア部品の劣化、電源障害(瞬停など)による故障、作業ミスによる破損など
- クラスタ(フェイルオーバークラスタ)構成、サーバ内の部品の冗長構成、スタンバイ(ホットスタンバイまたはコールドスタンバイ)構成を行う
- 電源供給不足 - 停電による稼働不可能状態
- サイト(地点)の機能不全 - 大規模災害、パンデミックによるオペレータ不在、紛争、特定サイトを狙ったDoS攻撃など
データ保全
データ紛失・破損のリスクは常に存在し、データ保全のためにバックアップ処理は必要不可欠な行為である。コンピュータで扱われるデータが貴重なものであるほど、高コストとなっても十分な対策を取る必要が発生する。
リスクの分類と対処法
データ紛失のリスクとしては大きく分けると以下のものがある。
- データの論理的な破壊 - ユーザのミス、データソース自体の誤り、ソフトウェアのバグ、ウイルス感染やクラッキングなどの第三者による意図的な改ざんなど
- 媒体の物理的破壊や紛失など - ハードウェアの故障(停電を含む)や、自然災害(火災、落雷、地震など)
これらのリスク分析をし、それぞれの要因に対して必要かつ十分なデータ保全対策を取ることが重要である。
- 論理的な破壊に対しては、バックアップを数世代分に渡り取得し、時間的に遡ることが出来るようにする対策が必要。可能な限り意図的な改ざんに耐え、破壊される直前に戻れることが望ましい。データの重要度や更新頻度にあわせてバックアップを取得する期間を決定することが必要になる。
- 物理的な破壊に対しては、別の場所に保存したり、別のメディアに保存したりする対策が必要。バックアップ先が同じ理由で使えなくなることを避けるには、可能な限り離れた場所、可能な限り別のしくみのメディアを使うことが望ましい。
一般的にバックアップには論理的にも物理的にも保護を望める方法を使うが、方法によっては「論理的破壊」または「物理的破壊」の一方のみに対しての対策にしかならない方法もあり、それ単体だけでは充分な対策にはならない。例をあげると以下のようなものがある。
- 同一ドライブ内にバックアップする。操作ミスなどの論理的な破壊に対しての対処としてのみ有効、操作ミスは最大の障害理由なのでとても有用だが、そのドライブが壊れてしまった場合にはまったくの無力である。
- 別のストレージに即座に反映する。物理的な破壊に対しては最良の手段、停止できないサービスを提供する場合は必須だが、操作ミスによる誤ったデータやコンピュータウイルスによる破壊状態なども即座に反映されてしまう。RAID などのミラーリングサービスがこれに該当する。
以上のような方法は、特定の目的の場合選択されるが、他の方法も組み合わせて使う必要があるといえる。
一般的な対策の例としてはコストに応じ、以下のものがある。
- オフラインメディアの遠隔地退避
- リムーバブルメディアにバックアップを取り、別の場所に保管することにより、場所的なリスク(自然災害や犯罪など)や、オンラインにあるリスク(データの論理的な破壊など)を軽減する。なお、東日本大震災のような大規模災害に対処するために、別地域に保管する遠隔地での保管が重要視される傾向にある。
- 多重バックアップ
- 重要度とコストに応じて、2重・3重に多重バックアップ(バックアップのバックアップ)を取る。例として、1次バックアップにはハードディスクドライブなどの高速なメディアを使い、2次バックアップにはDVDや磁気テープなどの比較低速・小容量のメディアを使う。これにより、複数の異なったメディアのおのおのの特色をデータ保全に活かすことができる。
- ネットワークを通した遠隔地バックアップ
- オンラインストレージやデータセンターなどにバックアップを取る(もしくは、データセンター自体を通常使用のストレージとし、またそこから他にバックアップを取る)
- 世代バックアップ
- 誤ったデータがバックアップに反映されてもとの正しいデータがなくなってしまわぬよう、バックアップの数回前までのデータを保持しておく。その分バックアップに必要な容量は増える。
RAID は、データを複数のハードディスクドライブへ分散記録させ、物理面での耐障害性を向上させる(RAID0は、唯一耐障害性を担保しない)。即時にデータ分散されるため他の多くのバックアップ構成のような時間的乖離がなく、ハードディスク単体の物理的な破壊からデータを守れ、担保範囲内の破損であれば使用不要となる時間はゼロである点で、企業用サーバなどには頻繁に利用されている。ただし、論理的なデータ破壊や複数のディスクが同じサーバ装置に格納されている(たとえば電気系統が同じなので、電源障害があれば等しく故障のきっかけを受ける)点などに鑑みれば RAID だけではバックアップとしては不足であり、殆どの場合において他のバックアップ構成も併用される。
バックアップは通常、バックアップした時点において最新なだけのデータしか復旧できないため、定期的にバックアップを取る必要がある。
バックアップ対象
バックアップする対象は、ファイルやフォルダ単位の場合と、ディスクやパーティション単位のイメージバックアップの場合がある。それぞれの特徴は
- ファイルバックアップ
以下の要因で、次に述べるイメージバックアップに比して「バックアップは速いがリカバリ(復旧)は遅い」という傾向がある。
- ファイルシステムを経由してバックアップを行う。すなわちバックアップ対象はファイルやフォルダである。
- ファイルシステムが持つタイムスタンプを利用できるため、増分バックアップや差分バックアップを実現しやすい。
- 復旧作業にあたっては復元先にファイルシステムが構築(ないしは先に復旧)されている必要があるため、復旧完了までに手順を要することが多い。
- イメージバックアップ
以下の要因で、先に述べたファイルバックアップに比して「バックアップは遅いがリカバリ(復旧)は速い」という傾向がある。
- ハードディスクのパーティションを(ファイルシステムを用いずに)記録データそのままをバックアップする。一般にnullデータを読み飛ばすことは出来ず、全域全量をバックアップする。
- 過去のバックアップデータ全量が比較対象となるため、増分バックアップや差分バックアップを実現しにくい(或いは処理時間(すなわちバックアップ所要時間)を要する)。
- 復旧時にはパーティション全体をリストアするので、ファイルバックアップと比較して復旧作業に手順が掛からない。
定期的にバックアップする方法の他に、デフォルトの設定を保存しておく方法もある。
バックアップの種類
バックアップには大まかに、以下の3種類に区分できる。
- フルバックアップ
- 必要なデータ全てを一度にまとめて一括に複製
- 差分バックアップ
- 前回のフルバックアップ時からの変更/追加されたデータのみを複製
- 増分バックアップ
- 前回のフルバックアップ、差分バックアップ、もしくは増分バックアップ時からの変更/追加されたデータのみを複製
さらに、データベースなどではトランザクションファイルを利用したトランザクションバックアップがある。
この他、必要データに対し内容に変化(更新・追加・削除・消去)が生じる都度、補助記憶装置の内容に対しても自動的に同じ動作を完全にとらせる(逐次、リアルタイムに内容の同期をとらせることで、フルバックアップと同じ成果を持たせられる)ミラーリングという技法もある。復旧を必要とした時点で、既に全ての必要データが保管されている状態なので、すぐに復旧作業に入れる(物理的事故の場合。データ内容自体の不備が原因である論理的事故の場合、ミラーリング先の内容も同じ問題を抱えているので、この場合は当てはまらない)ばかりでなく、普段のバックアップ作業・動作時間を事実上必要としない点が、フルバックアップに比べ優れている。ただしミラーリングは最大の障害要因である操作ミスやウィルスやクラックなどによる論理的な破壊からデータを守ることはできない。
別の基準から区分すると、各個人または組織のデータを複製するデータバックアップと、データをシステムを復旧させるためのイメージバックアップとがある。
これらのうち最初の3種類の特徴を以下に挙げる。
フルバックアップの特徴
- 毎回すべてのデータを複製しなければならないためバックアップに時間がかかる
- 複製したすべてのデータが一ケ所にまとまっているので、復旧時にデータを探し回る必要がない。
- バックアップ先に充分な空きがないと行えない。
差分バックアップの特徴
詳細は「差分バックアップ」を参照
- 一回はフルバックアップを行っておかないと差分が取れない
- 最後のフルバックアップ以降に変更/追加された分をすべて複製するだけなのでバックアップにかかる時間は短い
- ツールを使わない場合は自分で変更/追加したデータを把握しなければならない
- 復旧は、最後に行なわれたフルバックアップと、最後に行なわれた差分バックアップが必要になる
増分バックアップの特徴
詳細は「増分バックアップ」を参照
- 一回はフルバックアップを行っておかないと増分が取れない
- 最後のバックアップ以降に変更/追加されたデータだけ複製するだけなのでバックアップにかかる時間は極めて短い
- ツールを使わない場合は自分で変更/追加したデータを把握しなければならない
- 復旧は、最後に行なわれたフルバックアップと、(もしあれば)最後に行なわれた差分バックアップと、それ以降のすべての増分バックアップが必要になる
- 一度フルバックアップを行っておけば、以降は前回のバックアップから変更/追加したデータだけを複製しておけば良いため、小さなデータならちょっとした場所に保存できる
- フルジャーナル・ファイルシステムとの併用で差分バックアップのメリットでフルバックアップイメージを取得できるバックアップ方式も存在する。(例 ネットアップ社のスナップショット技術)
バックアップの運用
一般的に、システムの規模や用途により、適切な範囲と頻度でバックアップの運用がなされる。どの範囲のデータをどのくらいの頻度でバックアップし、どのくらいの時間破棄しないで保存しておくかといこうとを、システムやデータの重要度、運用や維持のコスト、その他の要因から総合的に判断してバックアップの計画が立てられ、運用される。この計画には、バックアップの種類(フル・バックアップ、差分/増分バックアップなど)も含まれる。また、バックアップの範囲や種類によっては、システムを停止しなければならないこともあり、そのような事情も計画に含まれる。
範囲
どのデータをバックアップするかということ。たとえば、データベースのデータをバックアップするとか、各ユーザのホーム・ディレクトリをバックアップするとかということ。
頻度
日次(毎日決まった時間帯)でバックアップを行うか、週次(毎週決まった日)で行うか、月次(毎月決まった日)あるいは年次(毎年決まった日)で行うかということ。
保存期間
バックアップをどのくらいの期間破棄しないで保存しておくかということ。データの種類によっては、法律により保存しておかなければならない、最低の期間が定められていることもある。
保管場所
バックアップを記録したメディアが簡単に紛失するようでは意味がないばかりでなく、紛失したメディアに保存された情報が外部に漏洩したり悪用されたりする危険がある。このため、バックアップが記録されたメディアは、所定の場所に保管し管理することが通常である。保管場所には、データが重要であるほど、施錠や認証など一定のセキュリティが施され、バックアップ・メディアを取り扱うことができる人物を限定するなどの対策が重要になる。バックアップ・メディアの保管用の部屋を用意したり、地理的に離れた別の建物に保管したり、信頼できる外部の業者に保管を委託するなど、バックアップの重要性やコストにより適切な保管場所を用意することが重要である。これらの措置は、バックアップの運用だけでなく、セキュリティの方針にも関係する。
バックアップメディアの種類
- フロッピーディスク - 容量:約1MB強
- かつては主に使われていた記録メディア。安価だが今となっては非常に小容量であるうえ、読み込み速度が遅いため、細々としたファイル単位でのバックアップ程度にしか使われない。また磁気や埃、汚れにも弱い。
- 大容量磁気ディスク - 容量:約100MB〜数GB
- 100MB以上の容量を持つ大容量リムーバブルメディア。ZipやJazなどがこれにあたる。書き込み速度では光ディスクより圧倒的に速いため、米国では一時期かなり普及したが、現在では容量や経済性で優れる光ディスクに取って代わられている。フロッピーディスクと同様に磁気や埃、汚れに弱い。
- 磁気テープ(コンピュータ用) - 容量:約数十GB〜数TB
- 大規模なサーバや汎用機で伝統的に使用されているメディア。ストリーマとも言い、大規模なものではテープメディアを自動交換する装置(オートローダ)もある。ランダムアクセスができないため、細かいデータのバックアップには向かないが、容量が大きいのでシステム全体のバックアップに向く。一方で、テンション調整(たるみ除去)、帯磁、消磁、定期クリーニングなど、メンテナンスが面倒。また、容量に対して安価であるが、記録装置(テープドライブ)の方は非常に高価であるため個人向けとは言いがたい。
- カセットテープ
- フロッピーディスクが標準化される以前は、データレコーダ(もしくはテープレコーダー)を用いて、データをデータ音に変調してオーディオ用カセットテープに保存する手段が個人向けとして使われていた。
- 光ディスク - 容量:約640MB〜128GB
- 現在よく使われているのはCD規格やDVD規格、BD規格による記録メディア、あるいはこれと互換性を有する規格によるメディアである。記録用メディアにはライトワンス(一度だけ書き込み可能、消去不可)とリライタブル(書き換え可能)の2種類があり、状況によって使い分ける。光ディスクの種類によっては熱や湿気、紫外線に弱い場合がある。業務用には自動クリーニング機能を搭載したメンテナンスフリーな装置もある。
- フラッシュメモリ - 容量:約数十MB〜数GB
- 小型で持ち運びに便利。現在USB接続タイプが主流。近年はFlash SSDも普及しつつあるが、その性質上長期のバックアップ用に使用されることはほとんどない。
- 光磁気ディスク(MO) - 容量:約100MB〜数GB
- 日本では一時期普及していた記録メディア。現在は光ディスク、フラッシュメモリーにほとんど取って代わられているが、それらよりもはるかに優れる信頼性・長期保管性から現在でも使用されることがある。
- ハードディスクドライブ - 容量:約数十GB〜数TB
- 厳密には記録メディア(媒体)ではなく、メディアと一体化した記録ユニット(装置)である。コンピューターの通常の補助記憶装置として利用されており、大容量で高速にバックアップが取れる。一方でもともとが内蔵部品用途でもあり磁気や衝撃に弱い[1]。前述のとおりメディア部分と記録装置部分とで構成されており、いずれかの故障によりデータを損失する可能性があるため、ハードウェア障害に弱い記録媒体と言える。また、ハードディスクの構造上[2]、修理には高度な設備と技術及び多大なコストが必要となる。
脚注
- ^ 外付けハードディスクは、丈夫なカバーを設けたり、衝撃があると磁気ヘッドをリキャリブレートさせる仕組みをもつなどにより弱点を補い、通常可搬性に考慮されている。
- ^ ハードディスクの内部は埃を非常に嫌う。
関連項目
- 冗長化
- レプリケーション
- ミラーリング、ファイル同期
- アーカイブ (コンピュータ)
- rsync
- セーブ (コンピュータ)
This article is about backup in computer systems. For other uses, see Backup (disambiguation).
In information technology, a backup, or the process of backing up, refers to the copying and archiving of computer data so it may be used to restore the original after a data loss event. The verb form is to back up in two words, whereas the noun is backup.[1]
Backups have two distinct purposes. The primary purpose is to recover data after its loss, be it by data deletion or corruption. Data loss can be a common experience of computer users; a 2008 survey found that 66% of respondents had lost files on their home PC.[2] The secondary purpose of backups is to recover data from an earlier time, according to a user-defined data retention policy, typically configured within a backup application for how long copies of data are required. Though backups represent a simple form of disaster recovery, and should be part of any disaster recovery plan, backups by themselves should not be considered a complete disaster recovery plan. One reason for this is that not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server by simply restoring data from a backup.
Since a backup system contains at least one copy of all data considered worth saving, the data storage requirements can be significant. Organizing this storage space and managing the backup process can be a complicated undertaking. A data repository model may be used to provide structure to the storage. Nowadays, there are many different types of data storage devices that are useful for making backups. There are also many different ways in which these devices can be arranged to provide geographic redundancy, data security, and portability.
Before data are sent to their storage locations, they are selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others. Every backup scheme should include dry runs that validate the reliability of the data being backed up. It is important to recognize the limitations and human factors involved in any backup scheme.
Contents
- 1 Storage, the base of a backup system
- 1.1 Data repository models
- 1.2 Storage media
- 1.3 Managing the data repository
- 2 Selection and extraction of data
- 2.1 Files
- 2.2 Filesystems
- 2.3 Live data
- 2.4 Metadata
- 3 Manipulation of data and dataset optimization
- 4 Managing the backup process
- 4.1 Objectives
- 4.2 Limitations
- 4.3 Implementation
- 4.4 Measuring the process
- 5 See also
- 6 References
- 7 External links
Storage, the base of a backup system
Data repository models
Any backup strategy starts with a concept of a data repository. The backup data needs to be stored, and probably should be organized to a degree. The organisation could be as simple as a sheet of paper with a list of all backup media (CDs etc.) and the dates they were produced. A more sophisticated setup could include a computerized index, catalog, or relational database. Different approaches have different advantages. Part of the model is the backup rotation scheme.
- Unstructured
- An unstructured repository may simply be a stack of or CD-Rs or DVD-Rs with minimal information about what was backed up and when. This is the easiest to implement, but probably the least likely to achieve a high level of recoverability as it lacks automation.
- Full only / System imaging
- A repository of this type contains complete system images taken at one or more specific points in time. This technology is frequently used by computer technicians to record known good configurations. Imaging[3] is generally more useful for deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems.
- Incremental
- An incremental style repository aims to make it more feasible to store backups from more points in time by organizing the data into increments of change between points in time. This eliminates the need to store duplicate copies of unchanged data: with full backups a lot of the data will be unchanged from what has been backed up previously. Typically, a full backup (of all files) is made on one occasion (or at infrequent intervals) and serves as the reference point for an incremental backup set. After that, a number of incremental backups are made after successive time periods. Restoring the whole system to the date of the last incremental backup would require starting from the last full backup taken before the data loss, and then applying in turn each of the incremental backups since then.[4] Additionally, some backup systems can reorganize the repository to synthesize full backups from a series of incrementals.
- Differential
- Each differential backup saves the data that has changed since the last full backup. It has the advantage that only a maximum of two data sets are needed to restore the data. One disadvantage, compared to the incremental backup method, is that as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system would require starting from the most recent full backup and then applying just the last differential backup since the last full backup.
- Note: Vendors have standardized on the meaning of the terms "incremental backup" and "differential backup". However, there have been cases where conflicting definitions of these terms have been used. The most relevant characteristic of an incremental backup is which reference point it uses to check for changes. By standard definition, a differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since then, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Other variations of incremental backup include multi-level incrementals and incremental backups that compare parts of files instead of just the whole file.
- Reverse delta
- A reverse delta type repository stores a recent "mirror" of the source data and a series of differences between the mirror in its current state and its previous states. A reverse delta backup will start with a normal full backup. After the full backup is performed, the system will periodically synchronize the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links, or using binary diffs. This system works particularly well for large, slowly changing, data sets. Examples of programs that use this method are rdiff-backup and Time Machine.
- Continuous data protection
- Instead of scheduling periodic backups, the system immediately logs every change on the host system. This is generally done by saving byte or block-level differences rather than file-level differences.[5] It differs from simple disk mirroring in that it enables a roll-back of the log and thus restoration of old images of data.
Storage media
From left to right, a DVD disc in plastic cover, a USB flash drive and an external hard drive
Regardless of the repository model that is used, the data has to be stored on some data storage medium.
- Magnetic tape
- Magnetic tape has long been the most commonly used medium for bulk data storage, backup, archiving, and interchange. Tape has typically had an order of magnitude better capacity-to-price ratio when compared to hard disk, but recently the ratios for tape and hard disk have become a lot closer.[6] There are many formats, many of which are proprietary or specific to certain markets like mainframes or a particular brand of personal computer. Tape is a sequential access medium, so even though access times may be poor, the rate of continuously writing or reading data can actually be very fast. Some new tape drives are even faster than modern hard disks.
- Hard disk
- The capacity-to-price ratio of hard disk has been rapidly improving for many years. This is making it more competitive with magnetic tape as a bulk storage medium. The main advantages of hard disk storage are low access times, availability, capacity and ease of use.[7] External disks can be connected via local interfaces like SCSI, USB, FireWire, or eSATA, or via longer distance technologies like Ethernet, iSCSI, or Fibre Channel. Some disk-based backup systems, such as Virtual Tape Libraries, support data deduplication which can dramatically reduce the amount of disk storage capacity consumed by daily and weekly backup data. The main disadvantages of hard disk backups are that they are easily damaged, especially while being transported (e.g., for off-site backups), and that their stability over periods of years is a relative unknown.
- Optical storage
- Recordable CDs, DVDs, and Blu-ray Discs are commonly used with personal computers and generally have low media unit costs. However, the capacities and speeds of these and other optical discs are typically an order of magnitude lower than hard disk or tape. Many optical disk formats are WORM type, which makes them useful for archival purposes since the data cannot be changed. The use of an auto-changer or jukebox can make optical discs a feasible option for larger-scale backup systems. Some optical storage systems allow for cataloged data backups without human contact with the discs, allowing for longer data integrity.
- Solid state storage
- Also known as flash memory, thumb drives, USB flash drives, CompactFlash, SmartMedia, Memory Stick, Secure Digital cards, etc., these devices are relatively expensive for their low capacity in comparison to hard disk drives, but are very convenient for backing up relatively low data volumes. A solid-state drive does not contain any movable parts unlike its magnetic drive counterpart, making it less susceptible to physical damage, and can have huge throughput in the order of 500Mbit/s to 6Gbit/s. The capacity offered from SSDs continues to grow and prices are gradually decreasing as they become more common.
- Remote backup service
- As broadband Internet access becomes more widespread, remote backup services are gaining in popularity. Backing up via the Internet to a remote location can protect against some worst-case scenarios such as fires, floods, or earthquakes which would destroy any backups in the immediate vicinity along with everything else. There are, however, a number of drawbacks to remote backup services. First, Internet connections are usually slower than local data storage devices. Residential broadband is especially problematic as routine backups must use an upstream link that's usually much slower than the downstream link used only occasionally to retrieve a file from backup. This tends to limit the use of such services to relatively small amounts of high value data. Secondly, users must trust a third party service provider to maintain the privacy and integrity of their data, although confidentiality can be assured by encrypting the data before transmission to the backup service with an encryption key known only to the user. Ultimately the backup service must itself use one of the above methods so this could be seen as a more complex way of doing traditional backups.
- Floppy disk
- During the 1980s and early 1990s, many personal/home computer users associated backing up mostly with copying to floppy disks. However, the data capacity of floppy disks failed to catch up with growing demands, rendering them effectively obsolete.
Managing the data repository
Regardless of the data repository model, or data storage media used for backups, a balance needs to be struck between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the user's needs. Using on-line disks for staging data before it is sent to a near-line tape library is a common example.
- On-line
- On-line backup storage is typically the most accessible type of data storage, which can begin restore in milliseconds of time. A good example is an internal hard disk or a disk array (maybe connected to SAN). This type of storage is very convenient and speedy, but is relatively expensive. On-line storage is quite vulnerable to being deleted or overwritten, either by accident, by intentional malevolent action, or in the wake of a data-deleting virus payload.
- Near-line
- Near-line storage is typically less accessible and less expensive than on-line storage, but still useful for backup data storage. A good example would be a tape library with restore times ranging from seconds to a few minutes. A mechanical device is usually used to move media units from storage into a drive where the data can be read or written. Generally it has safety properties similar to on-line storage.
- Off-line
- Off-line storage requires some direct human action to provide access to the storage media: for example inserting a tape into a tape drive or plugging in a cable. Because the data are not accessible via any computer except during limited periods in which they are written or read back, they are largely immune to a whole class of on-line backup failure modes. Access time will vary depending on whether the media are on-site or off-site.
- Off-site data protection
- To protect against a disaster or other site-specific problem, many people choose to send backup media to an off-site vault. The vault can be as simple as a system administrator's home office or as sophisticated as a disaster-hardened, temperature-controlled, high-security bunker with facilities for backup media storage. Importantly a data replica can be off-site but also on-line (e.g., an off-site RAID mirror). Such a replica has fairly limited value as a backup, and should not be confused with an off-line backup.
- Backup site or disaster recovery center (DR center)
- In the event of a disaster, the data on backup media will not be sufficient to recover. Computer systems onto which the data can be restored and properly configured networks are necessary too. Some organizations have their own data recovery centers that are equipped for this scenario. Other organizations contract this out to a third-party recovery center. Because a DR site is itself a huge investment, backing up is very rarely considered the preferred method of moving data to a DR site. A more typical way would be remote disk mirroring, which keeps the DR data as up to date as possible.
Selection and extraction of data
A successful backup job starts with selecting and extracting coherent units of data. Most data on modern computer systems is stored in discrete units, known as files. These files are organized into filesystems. Files that are actively being updated can be thought of as "live" and present a challenge to back up. It is also useful to save metadata that describes the computer or the filesystem being backed up.
Deciding what to back up at any given time is a harder process than it seems. By backing up too much redundant data, the data repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information.
Files
- Copying files
- With file-level approach, making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems.
- Partial file copying
- Instead of copying whole files, one can limit the backup to only the blocks or bytes within a file that have changed in a given period of time. This technique can use substantially less storage space on the backup medium, but requires a high level of sophistication to reconstruct files in a restore situation. Some implementations require integration with the source file system.
- Deleted files
- To prevent the unintentional restoration of files that have been intentionally deleted, a record of the deletion must be kept.
Filesystems
- Filesystem dump
- Instead of copying files within a file system, a copy of the whole filesystem itself in block-level can be made. This is also known as a raw partition backup and is related to disk imaging. The process usually involves unmounting the filesystem and running a program like dd (Unix). Because the disk is read sequentially and with large buffers, this type of backup can be much faster than reading every file normally, especially when the filesystem contains many small files, is highly fragmented, or is nearly full. But because this method also reads the free disk blocks that contain no useful data, this method can also be slower than conventional reading, especially when the filesystem is nearly empty. Some filesystems, such as XFS, provide a "dump" utility that reads the disk sequentially for high performance while skipping unused sections. The corresponding restore utility can selectively restore individual files or the entire volume at the operator's choice.
- Identification of changes
- Some filesystems have an archive bit for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup to determine whether the file was changed.
- Versioning file system
- A versioning filesystem keeps track of all changes to a file and makes those changes accessible to the user. Generally this gives access to any previous version, all the way back to the file's creation time. An example of this is the Wayback versioning filesystem for Linux.[8]
Live data
If a computer system is in use while it is being backed up, the possibility of files being open for reading or writing is real. If a file is open, the contents on disk may not correctly represent what the owner of the file intends. This is especially true for database files of all kinds. The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at any single point in time. This is because the data being backed up changed in the period of time between when the backup started and when it finished. For databases in particular, fuzzy backups are worthless.[citation needed]
- Snapshot backup
- A snapshot is an instantaneous function of some storage systems that presents a copy of the file system as if it were frozen at a specific point in time, often by a copy-on-write mechanism. An effective way to back up live data is to temporarily quiesce them (e.g. close all files), take a snapshot, and then resume live operations. At this point the snapshot can be backed up through normal methods.[9] While a snapshot is very handy for viewing a filesystem as it was at a different point in time, it is hardly an effective backup mechanism by itself.
- Open file backup
- Many backup software packages feature the ability to handle open files in backup operations. Some simply check for openness and try again later. File locking is useful for regulating access to open files.
- When attempting to understand the logistics of backing up open files, one must consider that the backup process could take several minutes to back up a large file such as a database. In order to back up a file that is in use, it is vital that the entire backup represent a single-moment snapshot of the file, rather than a simple copy of a read-through. This represents a challenge when backing up a file that is constantly changing. Either the database file must be locked to prevent changes, or a method must be implemented to ensure that the original snapshot is preserved long enough to be copied, all while changes are being preserved. Backing up a file while it is being changed, in a manner that causes the first part of the backup to represent data before changes occur to be combined with later parts of the backup after the change results in a corrupted file that is unusable, as most large files contain internal references between their various parts that must remain consistent throughout the file.
- Cold database backup
- During a cold backup, the database is closed or locked and not available to users. The datafiles do not change during the backup process so the database is in a consistent state when it is returned to normal operation.[10]
- Hot database backup
- Some database management systems offer a means to generate a backup image of the database while it is online and usable ("hot"). This usually includes an inconsistent image of the data files plus a log of changes made while the procedure is running. Upon a restore, the changes in the log files are reapplied to bring the copy of the database up-to-date (the point in time at which the initial hot backup ended).[11]
Metadata
Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too. [12]
- System description
- System specifications are needed to procure an exact replacement after a disaster.
- Boot sector
- The boot sector can sometimes be recreated more easily than saving it. Still, it usually isn't a normal file and the system won't boot without it.
- Partition layout
- The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system.
- File metadata
- Each file's permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment.
- System metadata
- Different operating systems have different ways of storing configuration information. Microsoft Windows keeps a registry of system information that is more difficult to restore than a typical file.
Manipulation of data and dataset optimization
It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can provide many benefits including improved backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements.
- Compression
- Various schemes can be employed to shrink the size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware.
- Deduplication
- When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication.
- Duplication
- Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed or to have a second copy at a different location or on a different storage medium.
- Encryption
- High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen.[13] Encrypting the data on these media can mitigate this problem, but presents new problems. Encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy.
- Multiplexing
- When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful.
- Refactoring
- The process of rearranging the backup sets in a data repository is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could potentially require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape. This is especially useful for backup systems that do incrementals forever style backups.
- Staging
- Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.
Managing the backup process
|
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2014) (Learn how and when to remove this template message) |
As long as new data are being created and changes are being made, backups will need to be performed at frequent intervals. Individuals and organizations with anything from one computer to thousands of computer systems all require protection of data. The scales may be very different, but the objectives and limitations are essentially the same. Those who perform backups need to know how successful the backups are, regardless of scale.
Objectives
- Recovery point objective (RPO)
- The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization between the source data and the backup repository.[14][15]
- Recovery time objective (RTO)
- The amount of time elapsed between disaster and restoration of business functions.[16]
- Data security
- In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies.
- Data retention period
- Regulations and policy can lead to situations where backups are expected to be retained for a particular period, but not any further. Retaining backups after this period can lead to unwanted liability and sub-optimal use of storage media.
Limitations
An effective backup scheme will take into consideration the limitations of the situation.
- Backup window
- The period of time when backups are permitted to run on a system is called the backup window. This is typically the time when the system sees the least usage and the backup process will have the least amount of interference with normal operations. The backup window is usually planned with users' convenience in mind. If a backup extends past the defined backup window, a decision is made whether it is more beneficial to abort the backup or to lengthen the backup window.
- Performance impact
- All backup schemes have some performance impact on the system being backed up. For example, for the period of time that a computer system is being backed up, the hard drive is busy reading files for the purpose of backing up, and its full bandwidth is no longer available for other tasks. Such impacts should be analyzed.
- Costs of hardware, software, labor
- All types of storage media have a finite capacity with a real cost. Matching the correct amount of storage capacity (over time) with the backup needs is an important part of the design of a backup scheme. Any backup scheme has some labor requirement, but complicated schemes have considerably higher labor requirements. The cost of commercial backup software can also be considerable.
- Network bandwidth
- Distributed backup systems can be affected by limited network bandwidth.
Implementation
Meeting the defined objectives in the face of the above limitations can be a difficult task. The tools and concepts below can make that task more achievable.
- Scheduling
- Using a job scheduler can greatly improve the reliability and consistency of backups by removing part of the human element. Many backup software packages include this functionality.
- Authentication
- Over the course of regular operations, the user accounts and/or system agents that perform the backups need to be authenticated at some level. The power to copy all data off of or onto a system requires unrestricted access. Using an authentication mechanism is a good way to prevent the backup scheme from being used for unauthorized activity.
- Chain of trust
- Removable storage media are physical items and must only be handled by trusted individuals. Establishing a chain of trusted individuals (and vendors) is critical to defining the security of the data.
Measuring the process
To ensure that the backup scheme is working as expected, key factors should be monitored and historical data maintained.
- Backup validation
- (also known as "backup success validation") Provides information about the backup, and proves compliance to regulatory bodies outside the organization: for example, an insurance company in the USA might be required under HIPAA to demonstrate that its client data meet records retention requirements.[17] Disaster, data complexity, data value and increasing dependence upon ever-growing volumes of data all contribute to the anxiety around and dependence upon successful backups to ensure business continuity. Thus many organizations rely on third-party or "independent" solutions to test, validate, and optimize their backup operations (backup reporting).
- Reporting
- In larger configurations, reports are useful for monitoring media usage, device status, errors, vault coordination and other information about the backup process.
- Logging
- In addition to the history of computer generated reports, activity and change logs are useful for monitoring backup system events.
- Validation
- Many backup programs use checksums or hashes to validate that the data was accurately copied. These offer several advantages. First, they allow data integrity to be verified without reference to the original file: if the file as stored on the backup medium has the same checksum as the saved value, then it is very probably correct. Second, some backup programs can use checksums to avoid making redundant copies of files, and thus improve backup speed. This is particularly useful for the de-duplication process.
- Monitored backup
- Backup processes are monitored by a third party monitoring center, which alerts users to any errors that occur during automated backups. Monitored backup requires software capable of pinging[clarification needed] the monitoring center's servers in the case of errors. Some monitoring services also allow collection of historical meta-data, that can be used for Storage Resource Management purposes like projection of data growth, locating redundant primary storage capacity and reclaimable backup capacity.
See also
- About backup
- Backup software
- Glossary of backup terms
- Remote backup service
- Virtual backup appliance
- Related topics
- Data consistency
- Data degradation
- Data proliferation
- Database dump
- Digital preservation
- Disaster recovery and business continuity auditing
- File synchronization
- Information repository
References
- ^ American Heritage Dictionary entry for backup, American Heritage Dictionary entry for back up
- ^ Global Backup Survey. Retrieved on 15 February 2009
- ^ "Five key questions to ask about your backup solution". sysgen.ca. Retrieved 2015-09-23.
- ^ Incremental Backup. Retrieved on 10 March 2006
- ^ Continuous Protection white paper. (1 October 2005). Retrieved on 10 March 2007
- ^ Disk to Disk Backup versus Tape - War or Truce? (9 December 2004). Retrieved on 10 March 2007
- ^ "Bye Bye Tape, Hello 5.3TB eSATA". Retrieved 22 April 2007.
- ^ Wayback: A User-level V File System for Linux (2004). Retrieved on 10 March 2007
- ^ What is a Snapshot backup?. Retrieved on 10 March 2007
- ^ Oracle Tips (10 December 1997). Retrieved on 10 March 2007
- ^ Oracle Tips (10 December 1997). Retrieved on 10 March 2007
- ^ Grešovnik, Igor (April 2016). "Preparation of Bootable Media and Images". Archived from the original on 2016-04-25. Retrieved 2016-04-21.
- ^ Backups tapes a backdoor for identity thieves (28 April 2004). Retrieved on 10 March 2007
- ^ Definition of recovery point objective. Retrieved on 10 March 2007
- ^ "Top four things to consider in business continuity planning". sysgen.ca. Retrieved 2015-09-23.
- ^ Definition of recovery time objective. Retrieved on 7 March 2007
- ^ HIPAA Advisory. Retrieved on 10 March 2007
External links
|
Look up back up in Wiktionary, the free dictionary. |
|
Look up backup in Wiktionary, the free dictionary. |
|
Wikimedia Commons has media related to Backup. |