int nums[]{4, 8, 15, 16, 23, 42}; int odd=0; auto nums_size = sizeof(nums)/sizeof(int); for (int i=0;i<nums_size;++i){ odd += nums[i] & 1; } cout << "# of odd nums: " << odd << endl;
Aynı döngüyü C++11 ile gelen yeni for-each gösterimi ile de gerçekleştirebiliriz:
int nums[]{4, 8, 15, 16, 23, 42}; int odd=0; for (auto& num: nums){ odd += num & 1; } cout << "# of odd nums: " << odd << endl;
Diğer bir yöntem ise STL kütüphanesinde yer alan for_each fonksiyonu ile döngüyü oluşturmaktır:
int nums[]{4, 8, 15, 16, 23, 42}; int odd=0; auto nums_size = sizeof(nums)/sizeof(int); for_each(nums,nums+nums_size,[&odd](int& num){ odd += num & 1; }); cout << "# of odd nums: " << odd << endl;
Aynı işlevi STL'deki count_if fonksiyonu ile de sağlayabiliriz:
int nums[]{4, 8, 15, 16, 23, 42}; int odd=0; auto nums_size = sizeof(nums)/sizeof(int); odd= count_if(nums,nums+nums_size,[](int& num){ return num & 1; }); cout << "# of odd nums: " << odd << endl;
Problemi count_if ile çözmenin kazanımlarından biri çözümü basitçe paralelleştirebilmektir:
int nums[]{4, 8, 15, 16, 23, 42}; int odd=0; auto nums_size = sizeof(nums)/sizeof(int); odd = __gnu_parallel::count_if(nums, nums+nums_size, [](int &value) { return value & 1; }); cout << "# of odd nums: " << odd << endl;
Şimdi yukarıda parça parça verilen kodları tek bir uygulamada toplayalım:
1 #include <iostream> 2 #include <parallel/algorithm> 3 4 using namespace std; 5 6 int main() { 7 int nums[]{4, 8, 15, 16, 23, 42}; 8 int odd=0; 9 auto nums_size = sizeof(nums)/sizeof(int); 10 11 for (int i=0;i<nums_size;++i){ 12 odd += nums[i] & 1; 13 } 14 cout << "# of odd nums: " << odd << endl; 15 odd= 0; 16 17 for(auto& num: nums){ 18 odd += num & 1; 19 } 20 cout << "# of odd nums: " << odd << endl; 21 22 odd= 0; 23 for_each(nums,nums+nums_size,[&odd](int& num){ 24 odd += num & 1; 25 }); 26 cout << "# of odd nums: " << odd << endl; 27 28 odd= count_if(nums,nums+nums_size,[](int& num){ 29 return num & 1; 30 }); 31 cout << "# of odd nums: " << odd << endl; 32 33 odd = __gnu_parallel::count_if(nums, nums+nums_size, [](int &value) { 34 return value & 1; 35 }); 36 cout << "# of odd nums: " << odd << endl; 37 cout << endl; 38 39 return 0; 40 }
Yukarıdaki kodu GNU C++ derleyeci ile aşağıda verildiği şekilde derleyip çalıştırabiliriz:
[student@host1]$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) [student@host1]$ g++ -o run -std=c++11 for_loop.cpp -lpthread -fopenmp [student@host1]$ ./run # of odd nums: 2 # of odd nums: 2 # of odd nums: 2 # of odd nums: 2 # of odd nums: 2
Elbette 6 elemanlı bir dizi için tek değerli elemanları sayma probleminin çözümünü paralelleştirmenin pratikte bir anlamı bulunmuyor. Ancak dizinin eleman sayısı artınca çekirdek sayısınca iplik kullanarak paralelleştirmek bir hızlandırma sağlayacaktır. Bu hızlanmanın ne kadar olduğunu inceleyelim. Önce problemi seri olarak çözelim:
#include <iostream> #include <vector> #include <random> #include <algorithm> #include <chrono> using namespace std; const int size = 100000000; vector<int> numbers; random_device rd; mt19937_64 rng(rd()); // mersene twister 19937 generator (64-bit) typedef vector<int>::iterator vect_iter; // alias void fill(vector<int> &data, int length) { for (int i = 0; i < length; ++i) { data.push_back(rng()); } } int main() { fill(numbers, size); auto strt = chrono::steady_clock::now(); auto finalResult = count_if(numbers.begin(), numbers.end(), [](int &value) { return value & 1; }); auto stp = chrono::steady_clock::now(); cout << "Final result (sequential): " << finalResult << " @ " << (stp - strt).count() << endl; return 0; }
Yukarıdaki seri kodu çalıştırıp başarıma bakalım:
[student@host1]$ g++ -o run -std=c++11 parallel_for_loop1.cpp [student@host1]$ ./run Final result (sequential): 50007237 @ 1782937391
Şimdi ise problemi C++11 ile gelen thread kütüphanesini kullanarak çözmeye çalışalım:
1 #include <iostream> 2 #include <thread> 3 #include <vector> 4 #include <mutex> 5 #include <random> 6 #include <chrono> 7 #include <algorithm> 8 9 using namespace std; 10 11 const int cores = thread::hardware_concurrency(); 12 const int size = 100000000; 13 const int size_per_core = size / cores; 14 15 vector<int> numbers; 16 random_device rd; 17 mt19937_64 rng(rd()); // mersene twister 19937 generator (64-bit) 18 vector<thread> threads; 19 vector<int> partialSolution; 20 mutex m; 21 22 typedef vector<int>::iterator vect_iter; // alias 23 24 void fill(vector<int> &data, int length) { 25 for (int i = 0; i < length; ++i) { 26 data.push_back(rng()); 27 } 28 } 29 30 void count_odd(vect_iter begin, vect_iter end, vector<int> &result) { 31 int counter = count_if(begin, end, [](int &value) { 32 return value & 1; 33 }); 34 m.lock(); 35 result.push_back(counter); 36 m.unlock(); 37 } 38 39 int main() { 40 fill(numbers, size); 41 std::chrono::steady_clock::time_point strt, stp; 42 43 strt = chrono::steady_clock::now(); 44 auto begin = numbers.begin(); 45 for (int i = 0; i < cores; ++i, begin += size_per_core) { 46 auto end = begin + size_per_core; 47 threads.push_back(thread(count_odd, begin, end, ref(partialSolution))); 48 } 49 for (auto &t : threads) t.join(); 50 auto finalResult = accumulate(partialSolution.begin(), partialSolution.end(), int()); 51 stp = chrono::steady_clock::now(); 52 cout << "Final result (parallel): " 53 << finalResult 54 << " @ " 55 << (stp - strt).count() 56 << endl; 57 58 return 0; 59 }
thread kütüphanesini kullanarak oluşturduğumuz çözümü derleyip çalıştıralım:
[student@host1]$ g++ -o run -std=c++11 parallel_for_loop2.cpp -lpthread [student@localhost parallel-programming-example]$ ./run Final result (parallel): 49995511 @ 1079533408
Son olarak problemi STL'deki paralel algoritmaları kullanarak çözelim:
1 #include <iostream> 2 #include <thread> 3 #include <vector> 4 #include <mutex> 5 #include <random> 6 #include <parallel/algorithm> 7 8 using namespace std; 9 10 const int cores = thread::hardware_concurrency(); 11 const int size = 100000000; 12 const int size_per_core = size / cores; 13 14 vector<int> numbers; 15 random_device rd; 16 mt19937_64 rng(rd()); // mersene twister 19937 generator (64-bit) 17 vector<int> partialSolution; 18 19 typedef vector<int>::iterator vect_iter; // alias 20 21 void fill(vector<int> &data, int length) { 22 for (int i = 0; i < length; ++i) { 23 data.push_back(rng()); 24 } 25 } 26 27 int main() { 28 fill(numbers, size); 29 std::chrono::steady_clock::time_point strt, stp; 30 31 // parallel solution 32 strt = chrono::steady_clock::now(); 33 auto finalResult = __gnu_parallel::count_if(numbers.begin(), numbers.end(), [](int &value) { 34 return value & 1; 35 }); 36 stp = chrono::steady_clock::now(); 37 cout << "Final result (parallel [STL] algorithm): " 38 << finalResult 39 << " @ " 40 << (stp - strt).count() 41 << endl; 42 return 0; 43 }
STL'deki paralel count_if fonksiyonunu kullanarak oluşturduğumuz çözümü derleyip çalıştıralım:
[student@host1]$ g++ -o run -std=c++11 parallel_for_loop3.cpp -lpthread -fopenmp [student@host1]$ ./run Final result (parallel [STL] algorithm): 49994371 @ 1391989217
Karşılaştırma yapabilmek için üç koduda tek bir kaynak kodda toplayalım:
1 #include <iostream> 2 #include <thread> 3 #include <vector> 4 #include <mutex> 5 #include <random> 6 #include <parallel/algorithm> 7 8 using namespace std; 9 10 const int cores = thread::hardware_concurrency(); 11 const int size = 100000000; 12 const int size_per_core = size / cores; 13 14 vector<int> numbers; 15 random_device rd; 16 mt19937_64 rng(rd()); // mersene twister 19937 generator (64-bit) 17 vector<thread> threads; 18 vector<int> partialSolution; 19 mutex m; 20 21 typedef vector<int>::iterator vect_iter; // alias 22 23 void fill(vector<int> &data, int length) { 24 for (int i = 0; i < length; ++i) { 25 data.push_back(rng()); 26 } 27 } 28 29 void count_odd(vect_iter begin, vect_iter end, vector<int> &result) { 30 int counter = count_if(begin, end, [](int &value) { 31 return value & 1; 32 }); 33 m.lock(); 34 result.push_back(counter); 35 m.unlock(); 36 } 37 38 int main() { 39 40 fill(numbers, size); 41 std::chrono::steady_clock::time_point strt, stp; 42 43 strt = chrono::steady_clock::now(); 44 auto begin = numbers.begin(); 45 for (int i = 0; i < cores; ++i, begin += size_per_core) { 46 auto end = begin + size_per_core; 47 threads.push_back(thread(count_odd, begin, end, ref(partialSolution))); 48 } 49 for (auto &t : threads) t.join(); 50 auto finalResult = accumulate(partialSolution.begin(), partialSolution.end(), int()); 51 stp = chrono::steady_clock::now(); 52 cout << "Final result (parallel) :" 53 << finalResult 54 << " @ " 55 << (stp - strt).count() 56 << endl; 57 58 // sequential solution 59 strt = chrono::steady_clock::now(); 60 finalResult = count_if(numbers.begin(), numbers.end(), [](int &value) { 61 return value & 1; 62 }); 63 stp = chrono::steady_clock::now(); 64 cout << "Final result (sequential) :" 65 << finalResult 66 << " @ " 67 << (stp - strt).count() 68 << endl; 69 70 // parallel stl algorithm solution 71 strt = chrono::steady_clock::now(); 72 finalResult = __gnu_parallel::count_if(numbers.begin(), numbers.end(), [](int &value) { 73 return value & 1; 74 }); 75 stp = chrono::steady_clock::now(); 76 cout << "Final result (stl parallel):" 77 << finalResult 78 << " @ " 79 << (stp - strt).count() 80 << endl; 81 return 0; 82 }
Derleyip 10 kere çalıştıralım ve çalışma zamanı başarımını karşılaştıralım:
[student@host1]$ for i in $(seq 1 10) ; do ./run; done Final result (parallel) :50003830 @ 391770880 Final result (sequential) :50003830 @ 1722770927 Final result (stl parallel):50003830 @ 651193996 Final result (parallel) :49995592 @ 399623007 Final result (sequential) :49995592 @ 1745905026 Final result (stl parallel):49995592 @ 656114655 Final result (parallel) :49996624 @ 403331645 Final result (sequential) :49996624 @ 1751347528 Final result (stl parallel):49996624 @ 640525779 Final result (parallel) :49998633 @ 382341755 Final result (sequential) :49998633 @ 1731618656 Final result (stl parallel):49998633 @ 597300980 Final result (parallel) :49999674 @ 380100468 Final result (sequential) :49999674 @ 1737255926 Final result (stl parallel):49999674 @ 612496887 ^TFinal result (parallel) :50000214 @ 394109006 Final result (sequential) :50000214 @ 1728814510 Final result (stl parallel):50000214 @ 598540879 Final result (parallel) :50000307 @ 436324428 Final result (sequential) :50000307 @ 1767703757 Final result (stl parallel):50000307 @ 592917336 Final result (parallel) :50001977 @ 421724960 Final result (sequential) :50001977 @ 1729831445 Final result (stl parallel):50001977 @ 620272259 Final result (parallel) :49998127 @ 410899174 Final result (sequential) :49998127 @ 1813701819 Final result (stl parallel):49998127 @ 591852057 Final result (parallel) :50000586 @ 415081631 Final result (sequential) :50000586 @ 1790710476 Final result (stl parallel):50000586 @ 554117088
Sonucu grafiksel olarak yorumlamaya çalışalım:
Hızlanma 4 çekirdekli Intel i7 işlemcisi aşağıdaki grafikteki gibi gerçekleşti:
Şimdi ise derleyicinin en iyileme ayarlarını sonuna kadar açalım ve uygulamayı tekrar çalıştıralım:
[student@host1]$ g++ -O3 -o run -std=c++11 parallel_for_loop.cpp -lpthread -fopenmp [student@host1]$ for i in $(seq 1 10) ; do ./run; done Final result (parallel) :49993240 @ 85592097 Final result (sequential) :49993240 @ 304494099 Final result (stl parallel):49993240 @ 87166009 Final result (parallel) :49998753 @ 66552277 Final result (sequential) :49998753 @ 312764432 Final result (stl parallel):49998753 @ 86458888 Final result (parallel) :49988432 @ 67755317 Final result (sequential) :49988432 @ 285352242 Final result (stl parallel):49988432 @ 87107690 Final result (parallel) :49999577 @ 64545883 Final result (sequential) :49999577 @ 289695428 Final result (stl parallel):49999577 @ 91366775 Final result (parallel) :50000906 @ 72690986 Final result (sequential) :50000906 @ 283258569 Final result (stl parallel):50000906 @ 81706558 Final result (parallel) :49995889 @ 65718162 Final result (sequential) :49995889 @ 298945194 Final result (stl parallel):49995889 @ 81954451 Final result (parallel) :49988992 @ 66934218 Final result (sequential) :49988992 @ 299720421 Final result (stl parallel):49988992 @ 113914656 Final result (parallel) :50000404 @ 62499660 Final result (sequential) :50000404 @ 311449204 Final result (stl parallel):50000404 @ 79420706 Final result (parallel) :50005821 @ 72953287 Final result (sequential) :50005821 @ 290275079 Final result (stl parallel):50005821 @ 87208042 Final result (parallel) :49995294 @ 65187374 Final result (sequential) :49995294 @ 310689078 Final result (stl parallel):49995294 @ 72715961
Süreler en iyileme sonunda önemli ölçüde aşağı indi, hızlanma oranlarında önemli bir iyileşme olmasa da biri birine yaklaştı:
thread sınıfı kullanarak oluşturulan çözümün seri çözüme göre sağladığı hızlanma ile OpenMP tabanlı paralel count_if fonksiyonunun seri çözüme göre sağladığı hızlanmanın karşılaştırılması |
Son olarak en iyilemeli ve en iyilemesiz tüm çalışma zamanlarını gösteren zamanları inceleyelim: