City Vs. Rural Population: A Regional Analysis

Nov 1, 2025 by Admin 47 views

Understanding population distribution is vital for policymakers, researchers, and anyone interested in global trends. This analysis delves into the distribution of population between urban and rural areas across different regions. We'll explore how to access data, interpret key metrics, and understand the factors driving urbanization. So, if you've ever wondered about the balance between city dwellers and rural residents, you're in the right place!

Understanding the Data Requirements (R24)

This analysis stems from a specific requirement (R24) within a larger project focused on urbanization. The goal is to provide a REST endpoint that delivers crucial population data for selected regions. This data will empower data analysts, policymakers, researchers, API consumers, and data journalists to analyze urbanization trends effectively. Here’s a breakdown of the requirement:

Epic: Urbanisation (% in cities)
As a: data analyst / policy maker / researcher / API consumer / data journalist
I want: a REST endpoint that returns, for a selected region, the total population, the population living in cities, the population not living in cities, and the corresponding percentages
So that: I can analyse urbanisation within a region and the assessor can verify SET09803 requirement R24
Dataset scope: region

To fulfill this requirement, we need a system that can efficiently retrieve and present data on total population, urban population, rural population, and their respective percentages for any given region. This involves designing an API endpoint and ensuring accurate data computation.

Data Points and Output

The system should provide a single-row output with the following fields:

name (STRING): Region name
total_population (INT): Total population in the region
in_cities_population (INT): Population living in cities within the region
in_cities_percent (DECIMAL, 0.0–100.0): Percentage of the population living in cities
not_in_cities_population (INT): Population not living in cities within the region
not_in_cities_percent (DECIMAL, 0.0–100.0): Percentage of the population not living in cities

This comprehensive data set will allow for a detailed understanding of urbanization levels across different regions.

API Endpoint and Parameters

To access this data, we'll use the following API endpoint:

GET /api/v1/population/region/{region}/cities-vs-noncities

Parameters
- region (STRING, case-insensitive): A valid region name from the country.Region dataset. The system should handle case-insensitive inputs for ease of use.

Computation Logic (SQL Reference)

The data is computed using the following SQL queries as a reference:

Total population in the region:

SELECT SUM(co.Population) AS total_pop
FROM country co
WHERE co.Region = :region;

Population living in cities in that region:

SELECT SUM(ci.Population) AS city_pop
FROM country co
JOIN city ci ON ci.CountryCode = co.Code
WHERE co.Region = :region;

These queries efficiently aggregate population data to provide the required metrics. Understanding the SQL logic helps in validating the accuracy of the API responses.

Rules and Assumptions

To ensure the data is consistent and reliable, several assumptions and rules are in place:

Inputs are trimmed, and comparisons are case-insensitive, ensuring user-friendly input handling.
The system validates the region input against known values and rejects invalid inputs.
If the total_pop is 0, the API responds with a 204 status code (empty but valid).
Percentages are rounded to 1 decimal place and must sum to 100.0 ± 0.1, ensuring accuracy in calculations.
Parameterized queries are used to prevent SQL injection vulnerabilities.
Appropriate status codes are returned for different scenarios:
- Invalid input: 400
- Unknown item: 404
- Empty but valid: 204

These rules are critical for maintaining data integrity and providing a robust API.

Acceptance Criteria

The acceptance criteria, defined using Gherkin syntax, outline the expected behavior of the API.

Scenario: Region urbanisation summary
- Given the world DB is loaded
- When I GET /api/v1/population/region/Southeast%20Asia/cities-vs-noncities
- Then the status is 200
- And the JSON has fields name,total_population,in_cities_population,in_cities_percent,not_in_cities_population,not_in_cities_percent
- And in_cities_population + not_in_cities_population = total_population
- And in_cities_percent + not_in_cities_percent is within 0.1 of 100.0

This scenario checks the basic functionality of the API and the correctness of the returned data.

Scenario: Case-insensitive region
- When I GET /api/v1/population/region/wEsTeRn EuRoPe/cities-vs-noncities
- Then the status is 200

This ensures the API correctly handles case-insensitive region names.

Scenario: Unknown region
- When I GET /api/v1/population/region/Unknownland/cities-vs-noncities
- Then the status is 404

This validates the API's ability to return a 404 status code for unknown regions.

Scenario: Empty but valid region
- Given a region with total population = 0
- When I GET /api/v1/population/region/Antarctica/cities-vs-noncities
- Then the status is 204

This confirms the API's behavior when a region has a total population of 0.

These scenarios provide a comprehensive set of tests to ensure the API functions as expected. Remember, rigorous testing is key to a reliable API.

Non-Functional Constraints

Non-functional constraints are crucial for ensuring the performance and reliability of the API.

Completes ≤ 1.5s on a typical laptop/CI runner
Memory ≤ 256 MB; UTF-8 JSON responses
Checkstyle/PMD/CodeQL: no critical findings
Endpoint + example documented in README

These constraints ensure the API is performant, efficient in resource usage, and adheres to coding standards. Performance matters! A fast and efficient API is crucial for a good user experience.

INVEST Self-Check

The INVEST criteria are used to ensure the task is well-defined and manageable:

[x] Independent
[x] Negotiable
[x] Valuable
[x] Estimable
[x] Small
[x] Testable

This self-check confirms that the task meets the necessary criteria for effective development and implementation. Always remember to INVEST in your tasks for better outcomes!

Priority and Estimate

The priority for this task is P0 - Critical, indicating its importance. The estimated effort is 2 story points, suggesting a relatively small but crucial piece of work. Prioritizing tasks helps in effective project management.

Definition of Done (DoD)

The Definition of Done (DoD) outlines the criteria that must be met before the task is considered complete:

[x] Unit tests added/updated
[x] Integration tests added/updated
[x] CI green (build, tests, coverage)
[x] CodeQL/PMD no critical findings
[x] README endpoints and example updated
[x] Evidence table row + screenshot added
[x] PR created (base: develop) and linked to this story
[x] Linked to parent Epic / Requirement label
[x] Reviewer approval obtained

The DoD ensures that all aspects of the task are completed to a high standard before it is considered finished. Quality is key! A clear DoD helps ensure a high-quality deliverable.

Useful Links

PR: # (add when opened)
Endpoint: GET /api/v1/population/region/{region}/cities-vs-noncities
Evidence screenshot: docs/evidence/r24_population_region_cities_vs_noncities.png

Conclusion

Analyzing the distribution of population between urban and rural areas is crucial for understanding regional dynamics and urbanization trends. This detailed exploration of the requirements, API design, and testing criteria provides a solid foundation for developing a reliable and valuable tool. By focusing on data accuracy, performance, and usability, we can empower policymakers and researchers to make informed decisions based on population data. Remember, understanding urbanization is key to planning for the future.