Business Services Industry
SEPATON CTO Miklos Sandorfi Outlines Key Considerations for Meeting the Deduplication Needs of the Enterprise
Business Wire, Nov 6, 2007
MARLBOROUGH, Mass. -- The volume of data generated by most companies has grown at such an explosive rate that many data centers are running out of space, power, cooling, and storage capacity. Issues of insufficient capacity are being compounded by increasingly stringent regulatory requirements and business initiatives demanding higher service levels, longer online retention times, and higher levels of data protection. Data deduplication technology is rapidly emerging as an effective solution to significantly offset data growth and meet regulatory and business requirements.
SEPATON, Inc.'s Chief Technology Officer, Miklos Sandorfi, cautions enterprises that data deduplication approaches vary and outlines key considerations for choosing the approach that best meets the needs of large enterprises.
Know the Basic Approaches of Data Deduplication - There are two basic categories of data deduplication technology: hash based and byte-level comparison deduplication. The hash-based approach runs incoming data through a hashing algorithm to create a small representation of the data and a unique identifier for that piece of data called a hash. It then compares the hash to previous hashes stored in a lookup table. If a match is found, then the duplicate data is replaced with a pointer to the existing data. If a match is not found, the data is added to the lookup table.
An alternate approach is utilizing byte-level comparison technology. Here, pattern matching is used to find duplicate data; since actual data comparisons are made, there is no data integrity risk. Some solutions take this a step further by using built-in intelligence about the actual file content for comparing data as objects (e.g., Word document to Word document or Oracle database to Oracle database) and identifying potential redundancies. Unlike other technologies that use the first instance of a file as the reference copy, enterprise-class implementations use the most recent copy and replaces older duplicate data with pointers. As a result, this technology eliminates the need to reconstitute new data from multiple reference points enabling instantaneous data restoration.
Distinguish between Inline vs Post-Processing. A key distinction between deduplication technologies is whether the deduplication process is done in-line as part of the backup process or as a post-process. Deduplication performed inline requires slightly less capacity and is adequate for relatively small backup requirements. However, this method has a significant negative impact on performance and cannot complete large backups required by enterprise organizations within typical backup windows. An alternative method completes backups at full, unimpeded performance. The deduplication process is started as soon as the backup process begins and continues in parallel with the backup in a fully integrated operation. The main benefit of this post-process method is that it can handle much larger volume backups within a typical eight-hour backup window. In addition, because it backs up a full set of data, post-process method enables a more rigorous data integrity checking capability.
Choose a Solution that can Backup and Restore Petabytes of Data. A primary consideration in choosing a backup technology for an enterprise or large enterprise is the solution's ability to handle terabytes or petabytes of data while staying within the backup window. The objective being to avoid creating dozens of separately managed "silos" of storage.
Ensure High-Performance Over Time. Many solutions see a marked degradation in performance over time as data becomes more fragmented across the disk and the database when duplicate data storage expands. Choose a solution that delivers performance regardless of the timeframe.
Set Realistic Expectations for Capacity Reduction. Deduplication approaches and results vary widely among solutions as does the time required to achieve maximum deduplication. The effectiveness of deduplication technology also depends heavily on the specific backup policies, the application and the mix of data types that are being backed up.
Check Restore Performance. Backing up data quickly is only half the challenge. To be successful, data needs to be restored quickly and efficiently. In fact, one of the key drivers for adopting deduplication technology is the ability to keep data on disk longer in order to simplify and accelerate restore times. Before adopting a new deduplication technology, be sure to test restore times and efficiency. Most restore requests are for data that is less than two weeks old. Solutions that use the first backup as the reference copy must recreate the most recent backup from weeks or months of pointers. In contrast, solutions that use the most recent backup as the reference copy can restore that data nearly instantaneously.
Ensure Data Integrity. Enterprise deduplication requires guaranteed data integrity. Some deduplication algorithms can result in data integrity issues. Look for solutions that guarantee data integrity. Enterprise class solutions perform a data integrity check that compares the deduplicated data to the original data set at the byte level before any duplicate data is deleted or disk space is redeployed. This comparison needs to ensure that when deduplicated data is reconstructed, it is byte for byte identical to the original backup.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Business Articles
- Multiple criteria evaluation and optimization of transportation systems
- Multi-criteria analysis procedure for sustainable mobility evaluation in urban areas
- A two-leveled multi-objective symbiotic evolutionary algorithm for the hub and spoke location problem
- Multi-criteria analysis for evaluating the impacts of intelligent speed adaptation
- The development of Taiwan arterial traffic-adaptive signal control system and its field test: a Taiwan experience
Most Recent Business Publications
Most Popular Business Articles
- 7 tips for effective listening: productive listening does not occur naturally. It requires hard work and practice - Back To Basics - effective listening is a crucial skill for internal auditors
- LIFO vs. FIFO: a return to the basics
- FAS 109: a primer for non-accountants - Financial Accounting Standards Board's "Statement 109: Accounting for Income Taxes"
- Too Young to Rent a Car? - 25-years-old the minimum age for car renting - Brief Article
- Design a commission plan that drives sales - Sales Commissions


