Segmented downloading
From Wikipedia, the free encyclopedia
Segmented downloading (also known as multisource downloading, swarming download) can be a more efficient way of downloading files from many peers at once. The one single file is downloaded, in parallel, from several distinct sources or uploaders of the file. This can help a group of users with asymmetric connections, such as ADSL to provide a high total bandwidth to one downloader, and to handle peaks in download demand.
Contents |
[edit] History
Segmented downloads probably have an origin with NASA and the magnetic tape based file systems used on Deep Space Network craft such as those in the Voyager Program.
NASA missions using some kind of segmented downloading
- Mars Rovers (for ICER image files)
- New Horizons (for Jupiter flyby data)
- Voyager Program (historical)
Swarmcast was the first significant peer-to-peer (P2P) content delivery system that implemented a kind of segmented downloading technology.
- The program and protocol was invented and developed in 1999 by Justin Chapweske and sold to Opencola, which released the software under a GPL license.
- A lot of the terms used in segmented downloading technology have their origin with Swarmcast, with BitTorrent being the only other significant contributor to the terms in use.
[edit] Network Implications
Most IP networks are designed for users to download more than they upload, usually with an expected (Download:Upload) ratio of 3:1 or more.
Segmented downloading when used by only 20% of an ISP's user base can upset the ISP's network to a point of requiring substantial reprogramming of routers and a rethink of network design.
- Traditional web object caching technology (like the Squid proxy) is of no use here.
- Universal adoption of IPv6 cannot help either, as it only allows all users to have fixed IP addresses. Fixed IP address don't fully address the routing table problems associated with segmented downloading.
- Typical downloading configurations can have a single user in touch with up to 10 to 30 ephemeral users per file scattered across the global internet.
- IP router tables can become bloated with routes to these ephemeral users slowing down table lookups.
[edit] Network advantages
Segmented downloading networks do have some advantages
- routes to the more obscure parts of the Internet can assert themselves across most of the Internet -- this is especially true for dial-up users
- segmented downloading does save some transmission capacity, as the number of lost or redundant megabytes is nominal compared to losing a prolonged http or ftp download
Most ISPs have learned to cope with segmented downloading technology, but coping has meant the mandatory deployment of TCP/IP traffic shaping technology.
[edit] Limitations
Segmented downloading technology cannot magically solve all downloading problems. There are mathematical constraints on the effectiveness of the technology.
In a group of users that has insufficient upload-bandwidth, with demand higher than supply. Segmented downloading can however very nicely handle traffic peaks, and it can also to some degree let uploaders upload "more often" to better utilize their connection.
Data integrity issues
- Very simple implementations of segmented downloading technology can often result in varying levels of file corruption, as there often is no way of knowing if all sources are actually uploading segments of the same file.
- Data corruption problems have led to most programs using segmented downloading using some sort of checksum or hash algorithm to ensure file integrity (to receive file intact) and uniqueness (to not receive bits of other similar files).
- Usually MD5 and SHA-1 hashes are preferred in most segmented download protocols, but CRC-64-ECMA would suffice in most cases. In cases where only MPEG files are being sent CRC-32-MPEG would also be acceptable.
- In the future most segmented downloading technologies will probably use layered hashes and checksums like WHIRLPOOL, SHA-256, SHA-512 and CRC-64-ECMA (for individual segments) to unquestionably guarantee data integrity. MD5 and SHA-1 have been determined to be cryptographically weak with respect to protecting data integrity.
[edit] Examples
[edit] Note
In the DC++ protocol multisource downloading is not considered to be the same as segmented downloading. The DC++ client has the capability for multisource downloading, but can only connect to one source at a time. Older clients could only resume a file from the same source. This feature is different from segmented downloading, due to the ability to connect to multiple sources at the same time. However, new protocol extensions as well as the alternative ADC protocol have turned segmented downloading into an implementation issue instead of a protocol one (and is used in some clients such as the aforementioned RevConnect).
[edit] See also
Segmented downloading should not be confused with the workings of Download Managers, which also segment files for faster download, but do not use several peers - they instead just open several connections to the same single server.

