yocto

Understanding Yocto Source Code MIRROR and AWS Storage Service

chanbae 2024. 8. 14. 05:13

In the Yocto Project, a MIRROR refers to an alternative repository that can be used to download source code. This allows the build system to download necessary sources from a specified mirror server or local storage first, without relying on external internet resources. It helps to improve build efficiency, conserve network bandwidth, and reduce the possibility of download failures.

Key Features and Benefits of MIRROR (Priority order)

1. Speed Improvement: When using a mirror server located on an internal network, sources can be downloaded much faster than from the internet.

2. Bandwidth Conservation: Reduces external internet traffic and saves bandwidth by utilizing the internal network.

3. Build Stability: Ensures stable builds despite issues with external server availability or network connectivity.

4. Consistency: By using the same source mirror, multiple developers can ensure that all builds access the same source code, maintaining consistency.

5. Offline Builds: Allows builds to proceed even in environments with limited or no internet access.

Ref: https://subscription.packtpub.com/book/iot-and-hardware/9781788399210/1/ch01lvl1sec22/sharing-downloads

 

  1. DL_DIR: Yocto first looks for the required files in DL_DIR. This is a local directory where already downloaded files are cached. If the necessary files are found here, they are not downloaded again over the network.
  2. PREMIRRORS:
    - If the files are not in DL_DIR, Yocto checks the mirror sites defined in PREMIRRORS. This variable defines mirror sites to reference before going to the original site specified in the source URI.
    - Typically, internal cache servers or local network mirrors are set up for faster access.
  3. SOURCE_MIRROR_URL:
    - If the files are not found in PREMIRRORS, Yocto attempts to use the internal mirror server set in SOURCE_MIRROR_URL.
    - This URL is often designated as an internal server for central management of source files.
  4. SRC_URI:
    - If the files are not found in any of the above stages, Yocto tries to download them from the original source URL defined in SRC_URI.
    - SRC_URI usually indicates the original source location, such as the official site or repository of an open-source project.
  5. MIRRORS: If the files are not found in SRC_URI, Yocto makes a final attempt using the mirror sites defined in MIRRORS. This provides an alternative download path when the original site is unstable or inaccessible.

In this way, Yocto optimizes network usage and attempts to download files through various paths to increase the availability of source files. SRC_URI defines the original location of the source and is used as a last resort when mirrors and caches fail.

Storage Options for Source Code in AWS (EFS, S3, EBS)

Ref: https://gocloudtech.medium.com/aws-storage-ebs-vs-s3-vs-efs-explained-6b760a1466ed

 

There are three services in AWS for storing objects: EFS (Elastic File Store), S3 (Simple Storage Service), and EBS (Elastic Block Store). EBS is not intended for sharing across multiple AWS instances but rather serves as storage for a single server. Therefore, EFS and S3 can be used for object storage.

AWS EFS (Elastic File System)

AWS EFS is a managed network file system (NFS) that can be mounted and used simultaneously by multiple EC2 instances.

PROS

1. POSIX Compatibility: EFS is a POSIX-compliant file system, allowing integration without application changes by using existing file system APIs.

 

2. Concurrent Access: It can be mounted and used simultaneously by multiple EC2 instances, making it advantageous for collaborative work and concurrent data processing.

3. Scalability: It automatically scales up and down as needed, simplifying management.

4. File System Interface: It supports file and directory structures, enabling data management in a manner similar to traditional file systems.

CONS

1. Cost: EFS is billed per GB, and the cost can be relatively high, especially if usage is low. Additionally, using higher throughput options can incur higher costs.

2. Latency: As a network file system, it may have higher latency compared to local disks.

3. Availability Limitations: It can only be used within specific regions, and additional configurations are required for use across regions.

AWS S3 (Simple Storage Service)

AWS S3 is an object storage service primarily used for storing and backing up large-scale data.

PROS

1. Low Cost: S3 allows for economical storage of large volumes of data, making it ideal for long-term storage and backup.

2. Durability and Availability: S3 provides 99.999999999% durability and ensures high availability.

3. Global Accessibility: S3 can be accessed from anywhere in the world, and cross-region replication enhances the geographical redundancy of data.

4. Various Storage Classes: You can choose from various storage classes to optimize costs based on your needs (e.g., S3 Standard, S3 Intelligent-Tiering, S3 Glacier, etc.).

CONS

1. Object Storage Limitations: As object storage rather than a file system interface, it may be less compatible with traditional file system APIs.

2. Latency: There can be network latency when accessing data, which might be unsuitable for workloads requiring frequent data reads and writes.

3. Implementation Complexity: Additional code or tools (such as S3FS) are needed for file-based access.

Current Project Setup and Plan

We are currently using multiple AWS EFS instances in our project. With the configuration set to use high throughput, we are incurring significant costs due to the unlimited scalability of EFS space.

 

The plan is to configure SOURCE_MIRROR_URL to use AWS S3 instead. Initial tests locally have shown that it works well, and a detailed explanation of the actual setup will be provided in a subsequent post. However, since CI is running a substantial number of processes simultaneously, we need to test if it can handle the load before fully implementing it. The results of these tests will be shared later.